Extracting Meaning from Millions of Pages: I’m becoming quite interesting in finding ways for computers to make human-ish determinations about information. I talked about sentiment analysis the other day, and now here’s an article about — and a really good description of — an inference engine — a system that examines words and tries to understand them, and even make leaps of knowledge about them.
An inference engine attempts to understand relationships by examining the words which describe them, and by finding sets of word patterns called “triples.”
[There exists] a general model for how relationships are expressed in English that holds true no matter the topic. “For example, the simple pattern ‘entity1, verb, entity2’ covers the relationship ‘Edison invented the light bulb’ as well as ‘Microsoft acquired Farecast’ and many more,” he says. “TextRunner relies on this model, which is automatically learned from text, to analyze sentences and extract triples with high accuracy.”
What gets interesting from there, is that it then tries to make leaps by putting together seperate things that it has learned.
TextRunner also serves as a starting point for building inferences from natural-language queries, which is what the group is now working on. To give a simple example: if TextRunner finds a Web page that says “mammals are warm blooded” and another Web page that says “dogs are mammals,” an inference engine will produce the information that dogs are probably warm blooded.