In this video, our CEO Riccardo Osti explained what machine learning is as well as it’s most common applications. Today he will explain why this acclaimed technology, which, don’t get us wrong, has a huge potential for several applications, does not really work well with text analysis.
When we mention machine learning we are also referring to AI, deep learning and more generally to statistical models. Many of today’s hi-tech leaders make use of these type of statistical technologies for the most diverse applications. Some of these companies are: Tesla, IBM Watson, Google, Amazon et cetera
So why are we saying that machine learning doesn’t work if so many of these brands are using it so heavily? Well, it does not work well for text analysis, and we’ll tell you why.
Machine learning algorithms have the goal of identifying statistical patterns, and based on them, continuously improve their decision-making capabilities. The fact that machine learning is based on statistics implies that, in order to work properly, requires datasets sufficiently big to be considered statistically relevant.
In fact, the biggest variable that determines the performance of machine learning algorithms is often the amount and quality of data that is available for training purposes.
It goes without saying that the larger the dataset the better it is for these statistical models…and here comes the challenge for text analysis applications.
If you look at the domains where statistical models and machine learning are more successful, you will realize that these domains have an abundance of data, that can be used to easily identify specific trends. Good examples are self-driving cars, financial predictions or weather forecast.
Difficulties when analyzing text
If we think about texts, we hardly find cases where we can count on a decent amount of data. Especially when we want to extract detailed information from unstructured text, it becomes difficult to have sufficient data to train statistical algorithms. For example, if you would like to know why people complain about a specific problem they have with one of your products, most likely you would only have some reviews available on the internet…and that, clearly, won’t be sufficient.
Let me give you a better overview of why machine learning is not good for text analysis, due to lack of training datasets. If you put together all the books ever published by mankind, their total weight would be around 54 terabytes. I don’t want to say that you could keep all these texts in one hard drive, but that’s almost the case. On the other side, if we put together all the information recorded by the sensors of the self-driving cars currently running around the world, they would weight around 400 terabytes, each second.
What does it mean? It means that, even if it’s hard to believe, it’s easier to drive a driverless car than to make text analysis.
If you want to learn more about Machine Learning, read this other article about Machine Learning definition.