A Simple Artificial Intelligence Capable of Basic Reading Comprehension

Categories Machine Learning, Uncategorized
I was interested in an artificial intelligence that could do reading comprehension, but surprisingly,  I could not find much on the topic. I decided to try and create an AI that could do some basic reading comprehension and I got some good results:

Short Passage (Input):
Mary went sledding with her family. Mary won a race against her brother to the bottom of a hill. Mary cheered because it was her first time winning.

Input: Where did Mary go?
Output: Sledding

Input: What did Mary win?
Output: A race


Input: Who did Mary win a race against?
Output: Her brother

Input: Where did Mary win a race to?
Output: The bottom of the hill


Input: Why did Mary cheer?
Output: Because it was her first time winning






As we can see, the program can actually answer the questions about the passage.

Full Source Available Here

Introduction

What I am trying to accomplish is program capable of artificial semantic memory. Semantic memory refers to how we store our explicit knowledge and facts about the world. For example, our memory of our birth date or our knowledge that humans are mammals. I wanted to be able to make something that was able to read a passage and answer any questions I had.

Abstract Idea

An abstract idea of how I accomplished artificial semantic memory was to create a structure that can store a sentence in a different way that can be used to answer questions.

1. Structure the relationships betweens objects (nouns) in the sentence.

For example, in the sentence “Mary went sledding with her family”, there are three objects “Mary”, “sledding” and “her family”. Mary has a verb “go” (present tense of went) with the object “sledding”. The verb “go” is “with” the object “her parents”. 
After brainstorming different ways to represent the relationships between objects and actions, I came up with a structure similar to a trie which I will call a “word graph”. In a word graph, each word is a node and the edges are actions or propositions. 
Examples:
Mary went sledding with her family
Mary won a race against her brother to the bottom of the hill
Mary cheered because it was her first time winning

2. Answer questions using the structure.

A key observation to answering questions is that they can be reworded to be fill in the blanks. 
Examples:
Where did Mary go -> Mary went _______
What did Mary win -> Mary won _______
Who did Mary win a race against? -> Mary won a race against _______
Why did Mary cheer -> Mary cheered because/since _______
We can use this observation to read out answers from our tree structure. We can parse the question, convert it to a fill in the blank format and then 
Example:
Mary went _____
By following the tree, we see that we should put “sledding” in the blank.
Mary won _______
Mary won a race against ______
Mary won a race to ______
By following the tree, we see that Mary won “a race”, against “her brother”, to “the bottom”.

Implementation

I chose to implement this in Python since it is easy to use and has libraries to support natural language processing. There are three steps in my program: parsing, describing and answering. 
Parsing converting a sentence to a structure that makes sense of the sentence structure.
Describing is reading in a sentence and adding the information to our tree structure.
Answering is reading in a question, changing the format and completing from our tree structure.

Parsing

The first thing we have to do is parse the sentence to see the sentence structure and to determine which parts of a sentence are objects, verbs and propositions. To do this, I used the Stanford parser which works well enough for most cases. 
Example: the sentence “Mary went sledding with her family” becomes:
  (S
    (NP (NNP Mary))
    (VP
      (VBD went)
      (NP (NN sledding))
      (PP (IN with) (NP (PRP$ her) (NN family)))))
The top level tree S (declarative clause) has two children, NP (noun phrase) and VP (verb phrase). The NP consist of one child NNP (proper noun singular) which is “Mary”. The VP has three children: VBD (verb past tense) which is “went”, NP, and a PP (propositional phrase). We can use the recursive structure of a parse tree to help us build our word graph.

A full reference for the parsers tags can be found here.

I put the Stanford parser files in my working directory but you might want to change the location to where you put the files.

os.environ['STANFORD_PARSER'] = '.'
os.environ['STANFORD_MODELS'] = '.'

parser = stanford.StanfordParser()

line = 'Mary went sledding with her family'
tree = list(parser.raw_parse(line))[0]

Describing

We can use the parse tree to build the word graph by doing it recursively. For each grammar rule, we need to describe how to build the word graph.

Our method looks like this:

# Returns edge, node 
def describe(parse_tree):

 ...

  if matches(parse_tree,'( S ( NP ) ( VP ) )'):

    np = parse_tree[0] # subject
    vp = parse_tree[1] # action

    _, subject = describe(np) # describe noun
    action, action_node = describe(vp) # recursively describe action

    subject.set(action, action_node) # create new edge labeled action to the action_node
    return action, action_node

  ....
We do this for each grammar rule to recursively build the word graph. When we see a NP (noun phrase) we treat it as an object and extract the words from it. When we see a proposition or verb, we attach it to the current node and when we see another object, we use a dot ( . ) edge to indicate the object of the current node.

Currently, my program supports the following rules:

( S ( NP ) ( VP ) )
( S ( VP ) )
( NP )
( PP ( . ) ( NP ) )
( PRT )
( VP ( VBD ) ( VP ) $ )
( VP ( VB/VBD ) $ )
( VP ( VB/VBZ/VBP/VPZ/VBD/VBG/VBN ) ( PP ) )
( VP ( VB/VBZ/VBP/VPZ/VBD/VBG/VBN ) ( PRT ) ( NP ) )
( VP ( VB/VBZ/VBP/VPZ/VBD/VBG/VBN ) ( NP ) )
( VP ( VB/VBZ/VBP/VPZ/VBD/VBG/VBN ) ( NP ) ( PP ) )
( VP ( VB/VBZ/VBP/VPZ/VBD/VBG ) ( S ) )
( VP ( TO ) ( VP ) )
( VP ( VB/VBZ/VBP/VPZ/VBD/VBG/VBN ) ( ADJP ) )
( VP ( VB/VBZ/VBP/VPZ/VBD/VBG/VBN ) ( SBAR ) )
( SBAR ( IN ) ( S ) )

For verbs, I used Nodebox (a linguistic library) for getting the present tense of a word so that the program knows different tenses of a word. E.g. “go” is the same word as “went”. 

Answering

We can answer questions by converting the question to a “fill in the blank” and then following the words in the “fill in the blank” in the word graph to the answer. My program supports two types of fill in the blanks: from the end and from the beginning.

Type I: From the end

A from the end type of fill in the blank is a question like:

Where did Mary go?

Which converts to:

Mary went _______

And as you can see, the blank comes at the end of the sentence. We can fill in this blank by following each word in our structure to the answer. A sample of the code is below:

# Matches "Where did Mary go"
if matches(parse_tree, '( SBARQ ( WHADVP ) ( SQ ( VBD ) ( NP ) ( VP )  )'):

  tokens = get_tokens(parse_tree) # Get tokens from parse tree

  subject = get_node(tokens[3]) # Get subject of sentence

  tokens = tokens[3:] # Skip first two tokens to make fill in the blank

  return subject.complete(tokens) # Complete rest of tokens
The node completes by reading each token and following the corresponding edges. When we run out of tokens, we follow the first edge until we reach another object and return the edges followed and the object.

Simplified node.complete:

class Node:
  ...
  def complete(self, tokens, qtype):
    if len(tokens) == 0:
      # no tokens left
      if qtype == 'why':
        # special case
        return self.why()
      if self.isObject:
        # return object
        return self.label
      else:
        # follow first until object
        return self.first.label + self.first.complete(tokens, qtype) 
    else:
      for edge, node in self:
        if edge == tokens[0]:
          # match rest of tokens
          return node.complete(tokens, qtype) 
      return "No answer"
  ...

We have to handle “Why” as a special case because we need to complete with “because” or “since” after there are no more tokens and we have to iterate backwards to the first object.

Type 2: From the beginning

A from the beginning type is a question like:

Who went sledding?

Which converts to:

 ____ went sledding?

As we can see, the blank is at the beginning of the sentence and my solution for this was to iterate through all possible objects and see which objects have tokens that match the rest of the fill in the bank.

Further Steps

There is still a long way to go, to make an AI perform reading comprehension at a human level. Below are some possible improvements and things to handle to make the program better:

Grouped Objects

We need to be able to handle groups of objects, e.g. “Sarah and Sam walked to the beach” should be split into two individual sentences.

Pronoun Resolution

Currently, pronouns such as he and she are not supported and resolution can be added by looking at the last object. However, resolution is not possible in all cases when there are ambiguities such as “Sam kicked Paul because he was stupid”. In this sentence “he” could refer to Sam or Paul.

Synonyms

If we have the sentence: “Jack leaped over the fence”, the program will not be able to answer “What did Jack jump over” since the program interprets jump as a different word than leap. However, we can solve this problem by using asking the same question for all synonyms of the verb and seeing if any answers work.

Augmented Information

If we have the sentence “Jack threw the football to Sam”, the program would not be able to answer “Who caught the football”. We can add information such as “Sam caught the football from Jack” which we can infer from the original sentence.

Aliasing

Sometimes objects can have different names, e.g. “James’s dog is called Spot” and the program should be able to know that James’ dog and Spot both refer to the same object. We can do this by adding a special rule for words such as “called”, “named”, “also known as” , etc.

Other

There are probably other quirks of language that need to be handled and perhaps instead of explicitly handling all these cases, we should come up with a machine learning model that can read many passages and be able to construct a structure of the content as well as to augment any additional information.

Full Source Available Here

11 Comments

  • Unknown
    September 18, 2015

    Michael,

    Interesting topic indeed.

    2 remarks :

    – You didn't address the semantic. In a sentence like "Mary saw the jaguar", you can hardly know if it was a car or an animal… hence have hard time answering some questions.

    – Maybe you could have a look at Dependency Grammars.

    Best,

    Philippe (philippedotamelineatfreedotfr)

  • Lewis McLean
    September 18, 2015

    have a look at http://arxiv.org/abs/1506.03340, Phil Blunsom gave a talk recently in Edinburgh and it sound similar to what you're trying to achieve here.

  • i336_
    September 18, 2015

    You should have a look at spaCy. It's a natural language parser that was written in Python too.

    Demo (seriously awesome): http://spacy.io/displacy/

    At the very least, *please* read this: http://spacy.io/blog/dead-code-should-be-buried/

    • Shobhit Verma
      September 19, 2015

      🙂

  • Mark
    September 18, 2015

    Very cool! In my experience, the best way to learn how machine learning and AI work is to start trying it; while there's a lot of science to it, figuring out how to apply it to real-world problems is as much an art as a science (there's a lot of work in finding the right model to describe a problem space correctly). This'll be valuable down the road.

  • d0npatrick
    September 19, 2015

    By the way, there is no certainty to infer if Jack actually caught the football from "Jack threw the football to Sam". It might equally have slammed him in the back of his head. There are however similar implicit facts that can be derived or deduced from other sentences, so I encourage your thinking. You seem to have quite a clear grasp of the issues and opportunities.

    • ayoungprogrammer
      September 21, 2015

      In that case, you could say Sam caught the ball with the back of his head =P Thanks! However, the issues I need to tackle will still take a long time to solve. Luckily, this project seems to have interested many people!

  • Spencer
    September 21, 2015

    This looks a bit like the sentence decomposition you are trying to achieve: http://www.ted.com/talks/ajit_narayanan_a_word_game_to_communicate_in_any_language

    I don't really know anything about NLP, but there might be some useful ideas in what Ajit built.

    • ayoungprogrammer
      September 21, 2015

      Thanks for the link! It was an interesting video that was similar to what I did.

  • TheDude FromCI
    March 13, 2016

    Is this project still alive? It seems very interesting, and I'd like to see progress continue. According to Github, the project hasn't been updated in about 6 months.

    Many more features could be added. Such as asking more complex questions. Like "Mike dropped the ball on the ground. Mike picked up the ball in his hand. Mike dropped the ball again. Where is the ball now?" and eventually combination with machine learning to approach a conversational level.

Leave a Reply

Your email address will not be published. Required fields are marked *