Probably, the most frequent question about text analysis is “what is the accuracy?”. So, today he will tell you what to measure in order to determine the quality of your analysis.
Let’s start by saying that what we usually call accuracy, is the mix of two main indicators, called precision and recall.
In simple words, the recall in text analysis measures how many times the system is able to identify a specific topic from an unstructured text, over the total amount of times when the topic is really mentioned in that text.
Imagine a random text where the author talks about the Sun in 10 different sentences, plus some other sentences about the moon and some others about Mars. Suppose that a text-analysis algorithm designed to identify expressions related to the Sun extracts 7 expressions from that text, of which only 5 of them are really related to the Sun. In that case its recall is 5 out of 10, which means 50%.
On the other side, Precision measures how many times the system has identified a topic correctly, amongst all the times that topic has been identified by the system itself.
Suppose a random text where the author talks about the Sun in 10 different sentences, plus some other sentences about the moon and some others about Mars. If a text-analysis algorithm tries to identify expressions related to the Sun, extracts a total of 7 expressions from that text, whether only 5 of them are really related to the Sun, and the other two to the moon, its precision would be 5 out of 7, which means approximately 70%.
To sum it up, we can consider accuracy as a score derived by the capability of the machine to extract topics from a given text, and its ability to make sure many of these extracted topics are correct.