In order to extract actionable insights from open-text customer feedback, the verbatim comments must be rendered into statistical information. This is required because there are no “fixed” variables in text analysis: the only somewhat statistical piece of information we have is the close to infinite number of keywords and phrases and that is a bad starting point for data analysis.
What we need is some kind of statistical “benchmark”. There needs to be a finite and known number of them. This would enable trend analysis and customer experience issues detection. This would be statistical information.
In open text all we have to work with is:
- the whole comment,
- sentence(s) in the comment,
- what is the base form for each word (morphology),
- what part of speech each keyword or phrase is (noun, verb etc.)
- keywords and phrases in a sentence, and
- the syntactic and semantic rules of each language.
WORD CLOUDS AREN’T INFORMATIVE
From that information we need to figure out some kind of variable that we can track over time. These variables should be structured so that decisions can be made based on them. We call these fixed variables CATEGORIES and they are created by capturing and pooling relevant keywords and phrases (using the steps in the graph).
CAPTURING RELEVANT KEYWORDS IS DIFFICULT
Capturing all keywords is rather simple. We have all seen the word clouds. They look good but provide very little information. What is much harder is to capture RELEVANT keywords. What this means are those keywords that are important for the context (industry and company). For that you need all Natural Language Process components in the chart below.
POOLING KEYWORDS TO CATEGORIES IS HARD WORK
Pooling the keywords into CATEGORIES is hard work and takes years of real data and real customers. But once an industry categorization is done the analysis is very close to being universal (within that industry).
SENTIMENT ANALYSIS REQUIRES A LARGE COLLECTION OF RULES AND PHRASES
Sentiment analysis uses the same language analysis results but it also has a large set of rules about how certain words or combination of words behave even when they are far from each other. And we have an ever growing collection of ironic phrases.
Knowing the category and the category specific sentiment renders the open-text feedback into statistical information that summarizes the voice of the customer.