Human communications are filled with nuance, metaphor, jargon, double entendre, context, and murky oceans of assumed knowledge. When interpreting speech or text, especially the casual chitchat of social networking, computers are often clueless.
A text analytics software company called Luminoso aims to keep computers in the human loop. “We help computers understand people better and help companies understand their customers better,” says CEO and co-founder Catherine Havasi. “Companies are trying to understand the text their customers generate and how they talk about things like flavors, senses, colors, and textures. They want to do this in a quantitative way so they can make data-driven decisions.”
Companies are trying to understand the text their customers generate and how they talk about things like flavors, senses, colors, and textures. They want to do this in a quantitative way so they can make data-driven decisions.
When she’s not working at Luminoso, Havasi helps improve the “ConceptNet” open data model project behind the software at the MIT Media Lab in her role as research scientist. The ConceptNet database currently stores some 17 million concepts in all major languages in a way that makes sense to computers.
Luminoso’s SaaS-based Analytics Platform builds on ConceptNet with analytics tools and visualizations that help build actionable insights from unstructured text data. Primarily sold to marketing departments, the software analyzes text derived from Twitter feeds, email, survey forms, tech support logs, and other inputs. Luminoso recently released a solution called Compass that analyzes text in real time.
From Search to Text Analytics ConceptNet wasn’t originally focused on marketing but rather on building a better search engine. In the late ‘90s, Havasi was working at the Media Lab with Marvin Minsky’s Society of Mind group when it set out to help computers interpret human search input.
“Internet search had just gotten started, and people tended to type in statements like ‘My cat is sick’ and would get nothing helpful in return,” says Havasi. To help bridge the gap, the group launched the Open Mind Common Sense Project (OMCS), whose mission was “to collect all the things people know but computers don’t,” says Havasi. This common sense or “world knowledge” includes relationships between physical objects or attributes, as well as human motivations.
Over the years, the search problem was largely solved, primarily by people learning how to pose questions the way a computer might. Yet, OMCS realized that teaching computers assumed knowledge had much broader applicability, especially in text analytics.
Common sense statements had never been collected in a comprehensive way, and the task seemed daunting. To speed the process, OMCS developed a technique, later known as crowdsourcing, which they referred to as “harnessing the power of bored people on the Internet,” says Havasi. “We put up a public web page with an input box and a prompt that said ‘Teach the computer.’ Each morning, we checked the statements, and decided which ones were true. Later, we developed automated ways to check the knowledge.”
As people found better things to do on the Internet, the OMCS motivated participants by integrating the inputs into games. “People would play Verbosity and teach the system without thinking about it,” says Havasi.
The researchers enhanced the resulting ConceptNet database by integrating other world knowledge. For example, they taught the computer to automatically scan Wikipedia and import common sense relationships.
ConceptNet Gets Multi-lingual and Multi-cultural Much of the current ConceptNet research involves encapsulating how common sense is perceived and expressed in different cultures and languages. “How we think of a cat can be very different depending on your culture,” says Havasi. “The same goes for the way we describe what we want from a hotel experience or even the concepts of flavor or the expected hour for dinner. You need to add cultural and linguistic nuance.”
Even when the task is limited to a single language, computers are challenged by passionate, creative, or playful text. “We use all these metaphors and try to say things in new and interesting ways, especially online, because we want people to listen to us,” says Havasi. “Our world knowledge helps us understand this language, and it can do the same for computers.”
One company asked Luminoso to help them decipher a customer comment claiming a product smelled “musty,” relates Havasi. “They needed to understand whether this was something isolated or systematic.” A typical text analytics system would stop after searching for the word “musty” or other synonyms. However, ConceptNet and Luminoso might extend that to notice a post saying “the product smells like an old house,” she adds.
Launching Luminoso The idea for Luminoso emerged when Havasi was directing the MIT Media Lab’s Digital Intuition group, working with member companies to help apply ConceptNet to text analytics. She quickly realized ConceptNet alone could not serve typical business needs.
“ConceptNet showed promising results, but companies wanted to use it in a more data-driven way,” she says. These insights resulted in the launch of Luminoso in 2010.
Typical text analytics solutions fail to meet the needs of marketing departments, says Havasi. “Statistical software that does things like look for how often words appear together requires a lot of data, and hand-coded ontology, or ruleset, systems lack adaptability,” says Havasi. “As the world changes along with the way people talk, the ontology can’t keep up, especially since it usually requires manual updates. You’re taking the person out of one part of the process and putting him into another part, which really isn’t solving the problem.”
By contrast, Luminoso’s Analytics Platform integrates self-learning algorithms that reduce the need for human updates. The software analyzes text ranging from survey open-ends to social media logs and builds actionable insights that can be applied across the product lifecycle.
“We can look at things that are hard to pin down, like intent to purchase or openness to new technology, or what improvements might help people advocate for a product,” says Havasi. “Our software can help figure out whether or not a brand can authentically build a certain kind of product or answer questions like what kind of SKUs drive somebody into a retail store.”
The Analytics Platform examines how a word is used, and then makes relations or analogies with the way other words are used. The software also finds contextual clues in customer metadata, geographical region, or the time a tweet was sent. The software can identify the number of stars a customer clicked on for a hotel review, and stir that into the story, or look at Twitter feeds to guess whether someone is a Republican or Democrat, says Havasi.
Marketing departments have invested heavily in analyzing social media, yet they often miss out on other text sources, says Havasi. “There’s incredibly rich information about your product online that is not in social media,” she says. “For example, there are forms devoted to shaving or cars, or how to optimize airline travel. One of the Analytics Platform’s strengths is that it doesn’t matter where the text comes from.”
Compass: Real-time Text Analysis In early 2015, Luminoso released Compass, which analyzes streams of text data in real time. “Compass lets you understand trends as they evolve, giving you early warning,” says Havasi.
In 2014, Sony used a Compass prototype when sponsoring the World Cup. Sony’s digital agency Isobar built a second screen experience called One Stadium Live that let fans watch World Cup updates on a tablet, and then comment via social media. It was the largest social media event in history.
The huge volume of “big text” generated by One Stadium Live was a challenge for Luminoso, but it was not so much the volume as the variability, says Havasi. “If someone scored a goal, everybody tweeted about it, which could overwhelm the experience. We needed to find who was contributing uniquely and interestingly, and determine which topics people cared about.”
Compass also needed to respond dynamically to new developments. When Luis Suárez of Uruguay bit Giorgio Chielleni of Italy, the topic of biting “suddenly became relevant to soccer, which was something we couldn’t anticipate,” says Havasi. “Compass had to react quickly.” For the most part, Compass was able to do that without human intervention, thereby greatly reducing reaction time.
Plugging In, Reaching Out Moving forward, Luminoso will continue to enhance its algorithms to help computers get the gist of ambiguous human communications. Yet, much of the focus is now on customer integration, including back-end IT plumbing and graphical visualizations.
Luminoso recently released APIs for the Analytics Platform that let users classify and tag text data without using the software’s GUI. In this way, customers can integrate the analytics into their existing software.
The software is currently a cloud-only platform, but Luminoso will soon release an on-premises version that will “go behind firewalls or handle highly secure data,” says Havasi. “This should open up new markets like financial services, pharmaceuticals, and law enforcement.”
A growing focus for Luminoso is to effectively communicate the sometimes subtle insights from the Analytics Platform or Compass. Like Luminoso’s core analytics, this process involves bridging the communications gap between computers and people.
“Visualizing the conclusions is almost as hard as coming up with the conclusions themselves,” says Havasi. “We’re always searching for different ways of visualizing our data, for example using different types of word clouds and heat maps, so that people can use it to make actionable decisions. We need to convey the conclusions to people all over the company, not only to analysts.”