Error: inherits(x, “Source”) is not TRUE in R

Text Analytics is interesting but challenging. I started with a simple goal to create a “WordCloud” using R. I thought of using the datasets from the Kaggle competition “Sentiment Analysis on Movie Reviews“. But I got challenged at each and every step. 

First, I got an error while loading the .tsv files. The details are here. I resolved that issue and finally loaded the required library for Text Mining “tm“. Below is the code to load the training dataset.
Next, I learned that I have to create a Corpus first because “The main structure for managing documents in tm is a so-called Corpus, representing a collection of text documents“. So, I used the below code and got an error.

movies_corpus <- Corpus((movies$Phrase))

The error is as below:
Error: inherits(x, “Source”) is not TRUE

I was not very clear about the concept of Corpus and then an error. Some investigation is now mandatory!

What is Corpus?
“Corpus” is a collection of text documents. The function corpus() is a convenience alias to SimpleCorpus or VCorpus depending on the arguments provided.A SimpleCorpus is fully kept in memory and it supports only the DataframeSource, DirSource and VectorSource.

A VCorpus means Volatile” corpus which implies that the corpus is stored in memory and would be gone when the R object containing it is destroyed. 

The syntax for creating such a corpus is as below:

VCorpus(x, readerControl)

x: 
a Source object which abstracts the input location.

tm:

provides a set of predefined source.  

getSources():

lists the available sources, and users can create their own sources.VectorSource is for character vector only.

readerControl: 

a list of the named components of the reader and language. Again tm provides a set of predefined readers and getReaders() lists the up-to-date list of available readers. Each source has a default reader which can be overridden.

Now, coming back to my error it says “inherits(x, “Source”) is not TRUE”. It is something about the Source argument. Since I am passing character values, let me try the below code:

movies_corpus <- Corpus(VectorSource(movies$Phrase))
moview_corpus

It worked!

So, the above code created a SimpleCorpus of 156060 documents.
There is a lot more information about the tm package here.

Thank You!

2

Leave a Reply

Your email address will not be published. Required fields are marked *