Natural Language Understanding by Matching Parse Trees

NLPJuly 8, 2016

Natural language understanding is defined as “machine reading comprehension”, i.e., a natural language understanding program can read an English sentence and understand the meaning of it. I have found that a shallow level of understanding can be achieved by matching the parse trees of sentences with only a few rules.

For example, suppose we wish to transform the following sentences into the corresponding programmatic commands:

"Call me an Uber" -> me.call({'item': 'uber'})
"Get my mother some flowers" -> me.mother.get({'item': 'flowers'})
"Order me a pizza with extra cheese" -> me.order({'item': 'pizza', 'with': 'extra cheese'})
"Give Sam's dog a biscuit from Petshop" -> sam.dog.give({'item': 'biscuit', 'from': 'Petshop'})

This seems like a very difficult task, but let’s examine the possible ways we can do this:

1) Use some combination of regexes and conditional statements to match a sentence.

Pros:

Simple and easy to implement
No data required

Cons:

Inflexible model / hard to add more commands

2) Gather hand labelled data of similar sentences and use a machine learning model to predict the intent of the command

Pros:

Flexible model / able to generalize

Cons:

Requires an abundance of hand labelled data

3) Use intent prediction

Pros:

Can use already trained model
Easy to use

Cons:

Changing model requires adding more data
Intent matching is very general
Hard to understand what is matched (blackbox)

4) Use parse trees to perform rule/pattern based matching

Pros:

Simple and easy to implement
Easy to modify model
More control of what is matched

Cons:

Non-adaptive, requires hand matching rules

I believe option 4 is a cheap, quick easy way to get extract meaning from sentences. Many people will argue it’s not “true” AI, but if you’re making a simple bot and not a AI that can philosophize the meaning of life with you, then this is good approach.

Lango

Lango is a natural language library I have created for providing tools for natural language processing.

Lango contains a method for easily matching constituent bracketed parse trees to make extracting information from parse trees easy. A constituent bracketed parse tree is a parse tree in bracketed form that represents the syntax of a sentence.

For example, this is the parse tree for the sentence “Sam ran to his house”:

In a parse tree, the leafs are the words and the other nodes are POS (parts of speech) tags. For example, “to” is a word in the sentence and it is a leaf. It’s parent is the part of speech tag TO (which means TO) and its parent is PP (which is pre-propositional phrase). The list of tags can be found here.

Suppose we want to match the subject (Sam), the action (ran) and the action to the subject (his house).

Let’s first match the top of the parse tree using this match tree:

From the match tree, we get the corresponding matches:

(NP sam) as (NP:subject)

(VBD ran) as (VBD:action)

(PP (TO to) (NP his house)) as (PP:pp)

Our PP subtree looks like:

Now let’s match the PP subtree with this match tree:

From the match tree, we get:

(NP his house) as (NP:to_object)

So the full context match from the two match trees base on this sentence is:

  action: 'ran'
  subject: 'sam'
  to_object: 'his house'

Code to do the matching as described above:

We use the token “NP:to_object-o” to match the tag NP, label it as ‘to_object’ and “-o” means get the string of the tree instead of the tree object.

More explanation of the rule matching syntax/structure can be found on the Github page.

from lango.parser import StanfordLibParser
from lango.matcher import match_rules

parser = StanfordLibParser()

rules = {
 '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )': {
 'np': {
 '( NP:subject-o )': {}
 },
 'pp': {
 '( PP ( TO=to ) ( NP:to_object-o ) )': {},
 '( PP ( IN=from ) ( NP:from_object-o ) )': {}
 }
 }
}

def fun(subject, action, to_object=None, from_object=None):
 print "%s,%s,%s,%s" % (subject, action, to_object, from_object)

tree = parser.parse('Sam ran to his house.')
match_rules(tree, rules, fun)
# output should be: sam, ran, his house, None

tree = parser.parse('Billy walked from his apartment.')
match_rules(tree, rules, fun)
# output should be: billy, walked, None, his apartment

Example

Now, let’s use Lango to do the example in the beginning which is to to transform the following sentences into the corresponding programmatic commands:

Call me an Uber -> me.call({‘item’: ‘uber’})

Get my mother some flowers -> me.mother.get({‘item’: ‘flowers’})

Order me a pizza with extra cheese -> me.order({‘item’: ‘pizza’, ‘with’: ‘extra cheese’})

Give Sam’s dog a biscuit from Petshop -> sam.dog.give({‘item’: ‘biscuit’, ‘from’: ‘Petshop’})

At first glance, there doesn’t seem to be an obvious pattern with these sentences. Let’s take a look at the parse trees for the above sentences so that we can make generalized match trees for them.

Call me an Uber:

  (S
    (VP (VB Call)
      (S
        (NP (PRP me))
        (NP (DT an) (NNP Uber))))
    )

Get my mother some flowers (incorrect parse):

  (S
    (VP (VB Get)
      (NP (PRP$ my) (NN mother))
      (NP (DT some) (NNS flowers)))
    )

Order me a pizza with extra cheese:

  (S
    (VP (VB Order)
      (S
        (NP (PRP me))
        (NP
          (NP (DT a) (NN pizza))
          (PP (IN with)
            (NP (JJ extra) (NN cheese))))))
    )

Give Sam’s dog a biscuit from Petshop:

  (S
    (VP (VB Give)
      (S
        (NP
          (NP (NNP Sam) (POS 's))
          (NN dog))
        (NP
          (NP (DT a) (NN biscuit))
          (PP (IN from)
            (NP (NNP Petshop))))))
    )

Engineering Match Trees

From the parse trees, we can see that although the sentences look very different, the syntax is very similar. We can take advantage of this by matching the patterns of the parse trees. If we look at the top parts of all these parse trees, we can see they follow the structure of either:

( S ( VP ( VB ) ( S ( NP ) ( NP ) ) ) or
( S ( VP ( VB ) ( NP ) ( NP ) )

We can go a step further and also add context to these match trees to match the action verb, the subject subtree and the object subtree:

( S ( VP ( VB:action-o ) ( S ( NP:subj_t ) ( NP:obj_t ) ) ) )
( S ( VP ( VB:action-o ) ( NP:subj_t ) ( NP:obj_t ) ) )

Let’s look at the subtrees for the subtree NP:subj_t for the sentences above:

( NP ( PRP$ me ) )
( NP ( PRP$ my ) (NN mother ) )
( NP ( PRP$ me ) )
( NP ( NP ( NNP Sam ) ( POS ‘s ) ) ( NN dog ) ) )

We can use the following rules to match context for subtree NP:subj_t to match the subject noun and the relation to the subject if it exists.

( NP ( PRP$:subject-o=my ) ( NN:relation-o ) )
( NP ( NP ( NNP:subject-o ) ( POS ) ) ( NN:relation-o ) )
( NP:subject-o )

Next, let’s look at the subtrees for NP:obj_t for the sentences above:

( NP ( DT an ) ( NNP Uber ) )
( NP ( DT some ) ( NNS flowers ) )
( NP ( NP ( DT a ) ( NN pizza ) ) ( PP ( IN with ) ( NP ( JJ extra ) ( NN cheese ) ) )
( NP ( NP ( DT a ) ( NN biscuit ) ) ( PP ( IN from ) ( NP ( NNP Petshop ) ) ) )

We can use the following rules to match context for subtree NP:obj_t to match the item and any modifiers/addons for the item.

( NP ( NP:item-o ) ( PP ( IN:item_in-o ) ( NP:item_addon-o ) ) )
( NP:item-o )

Finally, after matching all the trees and subtrees, our matched contexts for each sentence is:

Call me an Uber:

action: Call
subject: me
item: Uber

Get my mother some flowers (incorrect parse):

action: Get
subject: my
relation: mother
item: flowers

Order me a pizza with extra cheese:

action: Order
subject: me
item: pizza
item_in: with
item_addon: extra cheese

Give Sam’s dog a biscuit from Petshop:

action: Give
subject: Sam
relation: dog
item: biscuit
item_in: from
item_addon: Petshop

Once we have the context, we can easily transform them into programmatic language:

 def perform_action(action, item, subject, relation=None,
 item_addon=None, item_in=None):

 entity = subject
 if entity == "my":
 entity = "me"
 if relation:
 entity = '{0}.{1}'.format(entity, relation)

 item_props = {'item': item}
 if item_in and item_addon:
 item_props[item_in] = item_addon

 return '{0}.{1}({2})'.format(entity, action, item_props)

Full Code

Below is the full working code for the example. As you can see the it is very short and can handle any sentence which matches the syntax structure which may be hundreds of sentences. It is also easy to understand what is being matched compared to the black box of intent based guessing.

from collections import OrderedDict
from lango.parser import StanfordLibParser
from lango.matcher import match_rules

parser = StanfordLibParser()

sents = [
 'Call me an Uber.',
 'Get my mother some flowers.',
 'Order me a pizza with extra cheese.',
 'Give Sam\'s dog a biscuit from Petshop.'
]

"""
me.call({'item': u'uber'})
my.mother.get({'item': u'flowers'})
me.order({'item': u'pizza', u'with': u'extra cheese'})
sam.dog.give({'item': u'biscuit', u'from': u'petshop'})
"""

subj_obj_rules = {
 'subj': OrderedDict([
 # my brother / my mother
 ('( NP ( PRP$:subject-o=my ) ( NN:relation-o ) )', {}),
 # Sam's dog
 ('( NP ( NP ( NNP:subject-o ) ( POS ) ) ( NN:relation-o ) )', {}),
 # me
 ('( NP:subject-o )', {}),
 ]),
 'obj': OrderedDict([
 # pizza with onions
 ('( NP ( NP:item-o ) ( PP ( IN:item_in-o ) ( NP:item_addon-o ) ) )', {}),
 # pizza
 ('( NP:item-o )', {}),
 ])
}

rules = {
 # Get me a pizza
 '( S ( VP ( VB:action-o ) ( S ( NP:subj ) ( NP:obj ) ) ) )': subj_obj_rules,
 # Get my mother flowers
 '( S ( VP ( VB:action-o ) ( NP:subj ) ( NP:obj ) ) )': subj_obj_rules,
}

def perform_action(action, item, subject, relation=None,
 item_addon=None, item_in=None):

 entity = subject
 if entity == "my":
 entity = "me"
 if relation:
 entity = '{0}.{1}'.format(entity, relation)

 item_props = {'item': item}
 if item_in and item_addon:
 item_props[item_in] = item_addon

 return '{0}.{1}({2})'.format(entity, action, item_props)

for sent in sents:
 tree = parser.parse(sent)
 print match_rules(tree, rules, perform_action)

Tips and Tricks

From my experience with working with match trees, here are some tips and tricks:

Look at some parse trees of some of the sentences/commands you will be processing and find patterns at the top level of the tree and work your way down.
Make sure your commands/sentences are punctuated (have periods at the end) before parsing through the Stanford Parser
Take advantage of the recursive structure of parse trees to recursively match rules.

More to come:

Natural Language Query Engine
Modelling Conversations

Natural Language Understanding by Matching Parse Trees

Lango

Example

Engineering Match Trees

Full Code

Tips and Tricks

More to come:

2 Comments

Leave a Reply Cancel reply