lango

Natural Language Understanding by Matching Parse Trees

NLPJuly 8, 2016

Natural language understanding is defined as “machine reading comprehension”, i.e., a natural language understanding program can read an English sentence and understand the meaning of it. I have found that a shallow level of understanding can be achieved by matching the parse trees of sentences with only a few rules.

For example, suppose we wish to transform the following sentences into the corresponding programmatic commands:

"Call me an Uber" -> me.call({'item': 'uber'})
"Get my mother some flowers" -> me.mother.get({'item': 'flowers'})
"Order me a pizza with extra cheese" -> me.order({'item': 'pizza', 'with': 'extra cheese'})
"Give Sam's dog a biscuit from Petshop" -> sam.dog.give({'item': 'biscuit', 'from': 'Petshop'})

This seems like a very difficult task, but let’s examine the possible ways we can do this:

1) Use some combination of regexes and conditional statements to match a sentence.

Pros:

Simple and easy to implement
No data required

Cons:

Inflexible model / hard to add more commands

2) Gather hand labelled data of similar sentences and use a machine learning model to predict the intent of the command

Pros:

Flexible model / able to generalize

Cons:

Requires an abundance of hand labelled data

3) Use intent prediction

Pros:

Can use already trained model
Easy to use

Cons:

Changing model requires adding more data
Intent matching is very general
Hard to understand what is matched (blackbox)

4) Use parse trees to perform rule/pattern based matching

Pros:

Simple and easy to implement
Easy to modify model
More control of what is matched

Cons:

Non-adaptive, requires hand matching rules

I believe option 4 is a cheap, quick easy way to get extract meaning from sentences. Many people will argue it’s not “true” AI, but if you’re making a simple bot and not a AI that can philosophize the meaning of life with you, then this is good approach.

Lango is a natural language library I have created for providing tools for natural language processing.

Lango contains a method for easily matching constituent bracketed parse trees to make extracting information from parse trees easy. A constituent bracketed parse tree is a parse tree in bracketed form that represents the syntax of a sentence.

For example, this is the parse tree for the sentence “Sam ran to his house”:

In a parse tree, the leafs are the words and the other nodes are POS (parts of speech) tags. For example, “to” is a word in the sentence and it is a leaf. It’s parent is the part of speech tag TO (which means TO) and its parent is PP (which is pre-propositional phrase). The list of tags can be found here.

Suppose we want to match the subject (Sam), the action (ran) and the action to the subject (his house).

Let’s first match the top of the parse tree using this match tree:

From the match tree, we get the corresponding matches:

(NP sam) as (NP:subject)

(VBD ran) as (VBD:action)

(PP (TO to) (NP his house)) as (PP:pp)

Our PP subtree looks like:

Now let’s match the PP subtree with this match tree:

From the match tree, we get:

(NP his house) as (NP:to_object)

So the full context match from the two match trees base on this sentence is:

  action: 'ran'
  subject: 'sam'
  to_object: 'his house'

Code to do the matching as described above:

We use the token “NP:to_object-o” to match the tag NP, label it as ‘to_object’ and “-o” means get the string of the tree instead of the tree object.

More explanation of the rule matching syntax/structure can be found on the Github page.

Continue reading “Natural Language Understanding by Matching Parse Trees”