Natural Language Understanding by Matching Parse Trees
Natural language understanding is defined as “machine reading comprehension”, i.e., a natural language understanding program can read an English sentence and understand the meaning of it. I have found that a shallow level of understanding can be achieved by matching the parse trees of sentences with only a few rules.
For example, suppose we wish to transform the following sentences into the corresponding programmatic commands:
"Call me an Uber" -> me.call({'item': 'uber'}) "Get my mother some flowers" -> me.mother.get({'item': 'flowers'}) "Order me a pizza with extra cheese" -> me.order({'item': 'pizza', 'with': 'extra cheese'}) "Give Sam's dog a biscuit from Petshop" -> sam.dog.give({'item': 'biscuit', 'from': 'Petshop'})
This seems like a very difficult task, but let’s examine the possible ways we can do this:
- Simple and easy to implement
- No data required
- Inflexible model / hard to add more commands
- Flexible model / able to generalize
- Requires an abundance of hand labelled data
- Can use already trained model
- Easy to use
Cons:
- Changing model requires adding more data
- Intent matching is very general
- Hard to understand what is matched (blackbox)
Pros:
- Simple and easy to implement
- Easy to modify model
- More control of what is matched
- Non-adaptive, requires hand matching rules
I believe option 4 is a cheap, quick easy way to get extract meaning from sentences. Many people will argue it’s not “true” AI, but if you’re making a simple bot and not a AI that can philosophize the meaning of life with you, then this is good approach.
Lango
Lango contains a method for easily matching constituent bracketed parse trees to make extracting information from parse trees easy. A constituent bracketed parse tree is a parse tree in bracketed form that represents the syntax of a sentence.
For example, this is the parse tree for the sentence “Sam ran to his house”:
In a parse tree, the leafs are the words and the other nodes are POS (parts of speech) tags. For example, “to” is a word in the sentence and it is a leaf. It’s parent is the part of speech tag TO (which means TO) and its parent is PP (which is pre-propositional phrase). The list of tags can be found here.
Suppose we want to match the subject (Sam), the action (ran) and the action to the subject (his house).
Let’s first match the top of the parse tree using this match tree:
action: 'ran' subject: 'sam' to_object: 'his house'
from lango.parser import StanfordLibParser from lango.matcher import match_rules parser = StanfordLibParser() rules = { '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )': { 'np': { '( NP:subject-o )': {} }, 'pp': { '( PP ( TO=to ) ( NP:to_object-o ) )': {}, '( PP ( IN=from ) ( NP:from_object-o ) )': {} } } } def fun(subject, action, to_object=None, from_object=None): print "%s,%s,%s,%s" % (subject, action, to_object, from_object) tree = parser.parse('Sam ran to his house.') match_rules(tree, rules, fun) # output should be: sam, ran, his house, None tree = parser.parse('Billy walked from his apartment.') match_rules(tree, rules, fun) # output should be: billy, walked, None, his apartment
Example
At first glance, there doesn’t seem to be an obvious pattern with these sentences. Let’s take a look at the parse trees for the above sentences so that we can make generalized match trees for them.
Call me an Uber:
(S (VP (VB Call) (S (NP (PRP me)) (NP (DT an) (NNP Uber)))) )
Get my mother some flowers (incorrect parse):
(S (VP (VB Get) (NP (PRP$ my) (NN mother)) (NP (DT some) (NNS flowers))) )
Order me a pizza with extra cheese:
(S (VP (VB Order) (S (NP (PRP me)) (NP (NP (DT a) (NN pizza)) (PP (IN with) (NP (JJ extra) (NN cheese)))))) )
Give Sam’s dog a biscuit from Petshop:
(S (VP (VB Give) (S (NP (NP (NNP Sam) (POS 's)) (NN dog)) (NP (NP (DT a) (NN biscuit)) (PP (IN from) (NP (NNP Petshop)))))) )
Engineering Match Trees
- ( S ( VP ( VB ) ( S ( NP ) ( NP ) ) ) or
- ( S ( VP ( VB ) ( NP ) ( NP ) )
- ( S ( VP ( VB:action-o ) ( S ( NP:subj_t ) ( NP:obj_t ) ) ) )
- ( S ( VP ( VB:action-o ) ( NP:subj_t ) ( NP:obj_t ) ) )
Let’s look at the subtrees for the subtree NP:subj_t for the sentences above:
- ( NP ( PRP$ me ) )
- ( NP ( PRP$ my ) (NN mother ) )
- ( NP ( PRP$ me ) )
- ( NP ( NP ( NNP Sam ) ( POS ‘s ) ) ( NN dog ) ) )
We can use the following rules to match context for subtree NP:subj_t to match the subject noun and the relation to the subject if it exists.
- ( NP ( PRP$:subject-o=my ) ( NN:relation-o ) )
- ( NP ( NP ( NNP:subject-o ) ( POS ) ) ( NN:relation-o ) )
- ( NP:subject-o )
- ( NP ( DT an ) ( NNP Uber ) )
- ( NP ( DT some ) ( NNS flowers ) )
- ( NP ( NP ( DT a ) ( NN pizza ) ) ( PP ( IN with ) ( NP ( JJ extra ) ( NN cheese ) ) )
- ( NP ( NP ( DT a ) ( NN biscuit ) ) ( PP ( IN from ) ( NP ( NNP Petshop ) ) ) )
- ( NP ( NP:item-o ) ( PP ( IN:item_in-o ) ( NP:item_addon-o ) ) )
- ( NP:item-o )
Call me an Uber:
action: Call subject: me item: Uber
Get my mother some flowers (incorrect parse):
action: Get subject: my relation: mother item: flowers
Order me a pizza with extra cheese:
action: Order subject: me item: pizza item_in: with item_addon: extra cheese
Give Sam’s dog a biscuit from Petshop:
action: Give subject: Sam relation: dog item: biscuit item_in: from item_addon: Petshop
def perform_action(action, item, subject, relation=None, item_addon=None, item_in=None): entity = subject if entity == "my": entity = "me" if relation: entity = '{0}.{1}'.format(entity, relation) item_props = {'item': item} if item_in and item_addon: item_props[item_in] = item_addon return '{0}.{1}({2})'.format(entity, action, item_props)
Full Code
from collections import OrderedDict from lango.parser import StanfordLibParser from lango.matcher import match_rules parser = StanfordLibParser() sents = [ 'Call me an Uber.', 'Get my mother some flowers.', 'Order me a pizza with extra cheese.', 'Give Sam\'s dog a biscuit from Petshop.' ] """ me.call({'item': u'uber'}) my.mother.get({'item': u'flowers'}) me.order({'item': u'pizza', u'with': u'extra cheese'}) sam.dog.give({'item': u'biscuit', u'from': u'petshop'}) """ subj_obj_rules = { 'subj': OrderedDict([ # my brother / my mother ('( NP ( PRP$:subject-o=my ) ( NN:relation-o ) )', {}), # Sam's dog ('( NP ( NP ( NNP:subject-o ) ( POS ) ) ( NN:relation-o ) )', {}), # me ('( NP:subject-o )', {}), ]), 'obj': OrderedDict([ # pizza with onions ('( NP ( NP:item-o ) ( PP ( IN:item_in-o ) ( NP:item_addon-o ) ) )', {}), # pizza ('( NP:item-o )', {}), ]) } rules = { # Get me a pizza '( S ( VP ( VB:action-o ) ( S ( NP:subj ) ( NP:obj ) ) ) )': subj_obj_rules, # Get my mother flowers '( S ( VP ( VB:action-o ) ( NP:subj ) ( NP:obj ) ) )': subj_obj_rules, } def perform_action(action, item, subject, relation=None, item_addon=None, item_in=None): entity = subject if entity == "my": entity = "me" if relation: entity = '{0}.{1}'.format(entity, relation) item_props = {'item': item} if item_in and item_addon: item_props[item_in] = item_addon return '{0}.{1}({2})'.format(entity, action, item_props) for sent in sents: tree = parser.parse(sent) print match_rules(tree, rules, perform_action)
Tips and Tricks
From my experience with working with match trees, here are some tips and tricks:
- Look at some parse trees of some of the sentences/commands you will be processing and find patterns at the top level of the tree and work your way down.
- Make sure your commands/sentences are punctuated (have periods at the end) before parsing through the Stanford Parser
- Take advantage of the recursive structure of parse trees to recursively match rules.
More to come:
- Natural Language Query Engine
- Modelling Conversations
Leave a Reply