A Natural Language Query Engine without Machine Learning
What is this?
NLQuery is a natural language engine that will answer questions asked in natural language form.
Demo: http://nlquery.ayoungprogrammer.com
Source: http://ayoungprogrammer.github.com/nlquery
Example:
Input: Who is Obama married to?
Output: Michelle Obama
More examples:
Who is Obama? 44th President of the United States How tall is Yao Ming? 2.286m Where was Obama born? Kapiolani Medical Center for Women and Children When was Obama born? August 04, 1961 Who did Obama marry? Michelle Obama Who is Obama's wife? Michelle Obama Who is Barack Obama's wife? Michelle Obama Who was Malcolm Little known as? Malcolm X What is the birthday of Obama? August 04, 1961 What religion is Obama? Christianity Who did Obama marry? Michelle Obama How many countries are there? 196 Which countries have a population over 1000000000? People's Republic of China, India Which books are written by Douglas Adams? The Hitchhiker's Guide to the Galaxy, ... Who was POTUS in 1945? Harry S. Truman Who was Prime Minister of Canada in 1945? William Lyon Mackenzie King Who was CEO of Apple Inc in 1980? Steve Jobs
Why no machine learning?
How does it work?
Raw Input
Example of the raw input query string from a user:
"Who is Obama's wife?"
We can do some simple preprocessing to add punctuation and capitalization to the raw input to make it easier to parse in the next step.
Parse Tree
We take the preprocessed string and get the parse tree of the sentence from the Stanford CoreNLP Parser:
(SBARQ (WHNP (WP Who)) (SQ (VBZ is) (NP (NP (NNP Obama) (POS 's)) (NN wife))) (. ?))
This parse tree represents the grammatical structure of the sentence and from this we can match the grammar rules to extract the context.
Context
We can convert the grammar parse tree to context parameters by matching the tree with rules. We can doing this using my library for matching parse trees: Lango.
{ "( SQ ( VP ( VBZ/VBD/VBP:action-o ) ( NP:subj_t ) ) )": { subj_t: "( NP ( NP:subject-o ( NNP ) ( POS ) ) ( NN/NNS:prop-o )" } }
This grammar rule matches the parse tree and we can extract some context from the corresponding symbols in the rule.
{ "prop":"wife", "qtype":"who", "subject":"obama" }
We have the subject “Obama”, the property “wife” and the question type “who”. Once we have the contextual parameters of the query, we can construct a SPARQL query to query the WikiData database.
WIkidata SPARQL Query
For this application, we will consider two types of SPARQL queries:
- finding property of an entity (e.g. Who is Obama’s wife?)
- We can search for the property that matches the entity (e.g.entity:Obama property:spouse ?x)
- finding instances of entities with given properties (e.g. Which POTUS died from laryngitis?)
- We can search for entities that are instances of the type we want that match the properties. E.g. which books are written by Douglas Adams: (?x property:instanceOf entity:book AND ?x property:writtenBy entity:DouglasAdams)
- There are some extra cases needed to handle for this such as “positions held” that are a type of entity but is not an instance of. (?x property:positionHeld entity:POTUS AND ?x property:causeOfDeath entity:laryngitis)
SELECT ?valLabel ?type WHERE { { wd:Q76 p:P26 ?prop . ?prop ps:P26 ?val . OPTIONAL { ?prop psv:P26 ?propVal . ?propVal rdf:type ?type . } } SERVICE wikibase:label { bd:serviceParam wikibase:language "en"} }
Result
End result from querying WikiData:
{ head: { vars: [ "valLabel", "type" ] }, results: { bindings: [ { valLabel: { xml:lang: "en", type: "literal", value: "Michelle Obama" } } ] } }
Thus we get the final answer as “Michelle Obama”.
What else will you add?
Some ideas I have to extend this further would be to:
- Add other data sources (e.g. DBPedia)
- Spell check in preprocessing
This is cool! How can I help?
The code is relatively short and simple (~1000 lines with comments) and it should be easy to dive in and make your own pull request!