Introduction to Text Analytics Using R

I was browsing the competitions in Kaggle with Category as “Getting Started” and the one that caught my attention is “Bag of Words Meets Bags of Popcorn“. This competition is about Sentiment Analysis in Machine Learning. Well, that is super interesting! In Kaggle, there is a tutorial in Python. Since I am still struggling with R, I thought of checking out and understand the concepts of Text Analytics in R first.

What is my understanding of Text Analytics/Text Mining?

There is a wide collection of Text Data in this world. When we are writing a review about a product in Amazon, or a restaurant in Yelp or about a Hotel in TripAdvisor, we are putting our sentiments in words. These words contain a lot of information. Text Analytics is the process to read these large text data and to derive some meaningful insights

For a human Being, it is both time and cost consuming to analyze all the Text and  Predict or find a pattern. That’s why we train Computers to analyze Text. This field of study is called “Natural Language Processing“. But again, it is difficult for the computer to understand and predict some useful information from some text because a person can use Metaphors, Sarcasm, Wrong Spellings, or some non-traditional grammar or any natural language to express their sentiments.


Text Analytics Workflow:
Text Analytics process can be performed broadly in below steps:
1. Define the problem and set some distinct goals
2. Collect the Text to analyze
3. Organize and clean the Text Data
4. Build a Text Analytics Model
5. Start Analysis and reach an insight
Text Analytics applications:

I use the Amazon App a lot and it is amazing. I was looking for a Knee Support and was looking for Customer Reviews. Reading a review is so easy and see how the most used words are separated. That’s so cool, right?

The number of Stars can be misleading sometimes. One customer can give 3 stars and still happy with the product and another customer can give 3 stars and write some genuine problems in the review. So, to get the actual picture, reading the text is so important.

Some other real-life scenarios where  Text Analytics are used are as follows:

  1. Sentiment Analysis e.g if a Tweet about a company is positive or negative.
  2. Text Classification e.g classifying the emails we get as spam or ham.
  3. Text Clustering e.g. Automatic labeling of documents by topics in business libraries
  4. Entity extraction e.g identifying people, places, organizations, and other entities from documents.
  5. Document summarization i.e. to automatically provide the most important points in the original document. This is particularly good for news summary.

In R, the package tm provides a Text Mining Framework. There are a lot of other packages which are used in Natural Language Processing and the list is here.

So, let’s start our Text Analytics journey and have some fun!

0

1 comment

Leave a Reply

Your email address will not be published. Required fields are marked *