Note: The Cinchapi Data Platform is powered by the Concourse Database, an open source project founded by Cinchapi CEO, Jeff Nelson.

Last week we looked at some of the challenges of implementing Natural Language Processing to use as an interface for a database.  Now we’ll continue with more language quirks that could trip up a machine, and some of the ways we can resolve them.

Conjunctions and Disjunctions Impact the Functions

Most of us are familiar with conjunctions in English.  The word “and”is a common conjunction which tends to join two related concepts.  A simple example? “I would like a hamburger and french fries”. Cholesterol worries aside, we know this means that the person is looking to get two related things at one time.  If they wanted one or the other, they would use the disjunction, “or” and say “I would like a hamburger or french fries.”

Simple, right?  Except that the English language can be kind of “loosey goosey” with some grammatical rules, and those exceptions could trip up a natural language processor.

Consider this question back at PayCo.  If they made a request to “Show me all of the employees of AtlCo who reside in Florida and Georgia”, the result might well be zero, even if AtlCo had employees in both states.  Why?  Because the the machine sees the word “and” as a conjunction, but it is being used here as a disjunction.  The machine knows that employees can reside in one state or another, but for residency purposes they can’t reside in both.  Thus, the response might be “none”, “nul” or “zero”.

While the user could be trained to ask the question with “or” instead of “and”, it is also possible to use advanced heuristics to resolve these ambiguities and let the machine learn when “and” is being used as a conjunction or a disjunction.

Compound Nouns

We should all remember that a noun refers to a person, place, or a thing.  That’s simple enough.  Compound nouns are identical in that sense, but they are created by combining multiple words.  We know a “department” in a company refers to a group with a specific role to play within the company.  We might specify which department we’re talking about with a compound noun.  For example the “robot department” would be the department focused on robots and robotics.  Simple, right?

For humans, sure, that’s simple.  But for a machine, “robot department” could mean a department staffed by robots just as much as it could mean a department for people who work with robots.  Again, the answer to this is to ensure that semantic meaning is inferred from applied heuristics, a knowledge base, and the actual data stored in databases.


Not to be confused with the literary device of the same name, an anaphor refers to the relation between a grammatical substitute and its antecedent.  Here are some examples we might find in common conversations:

Q – How was the game?

A – It was fun!

Do you see how the anaphor, “it” replaced “the game”? We can do the same with pronouns:

Q – Where did you go?

A – To see David’s new house.

Q – What did you think of it?

A – He loves it, but I think it’s a dump.

In this example, we see two anaphors.  The “it” in both cases referring to the “new house”, while “he” refers to “David”. We can also see how context can build from one question to the follow up, without the need to repeat elements in whole.

Practically we’re not looking for a computer’s opinion on a new house, but we can see where anaphors need to be used to make a NLP interface useful.  Imaging this scenario at an airport:

Q – Which plane has most recently landed?

A – Delta Flight 776

Q – Where did it originate from?

A – Los Angeles International

Again, we see the word “it” standing in for “Delta Flight 776”.

Why is it critical for any natural language interface to understand Anaphors and what they represent?  Without that understanding, users would be forced to constantly specify the object in question.

Let’s look at the above example, this time without using anaphors:

Q – Which plane has most recently landed?

A – Delta Flight 776

Q – Where did Delta Flight 776 originate from?

A – Los Angeles International

Granted, this is a fairly simple example, but if further details about Delta Flight 776 were desired, then the whole phrase “Delta Flight 776” would be required for each and every query. By employing appropriate discourse models, we can ensure that the conversational elements keep the context clear.

Elliptical Sentences

While we might like to think that we always make clear statements and queries, typically, we tend to rely on context to make ourselves understood.  Such is the case with incomplete sentences, also known as elliptical sentences.

Imaging the following conversation between two people:

Q – Who is the highest earning employee of AtlCo?

A – John Smith

Q – The lowest earning?

A – Sasha Reed

By itself “The lowest earning?” lack specificity.  It only makes sense in the context of a conversation where the specifics, in this case “earning” of an employee, are made clear only when looking at the totality of the conversation.

Just like with anaphors, we can address this issue by employing and maintaining a discourse model to keep track of the context of the previously asked questions.