Deep Learning Model to Generate Text using Keras LSTM

Deep Learning is a topic that I had been avoiding for some time now. Somehow, its potential is intimidating. I always felt that Deep Learning Models are complex and not-so-easy to work with on my Mac OSx. I was worried that I will not be able to fit a Model and then finally see some output. But then, I built a Deep Learning Model to Generate Text or a Story using Keras LSTM with a very little glitch.

Honestly, I was only stuck at the first step while installing Keras and TensorFlow. I work on Jupyter Notebook and I spent a few hours just to write “import Keras” and get it to work. I was on the verge of giving up when I find a solution to install Keras and TensorFlow.

Once I installed and Import Keras, I realized that Keras is precisely what it says – a perfect API that facilitates being able to go from idea to result as fast as possible. So, my understanding is how the algorithm actually works, that’s complex! But to make it work is not complex at all. It is liberating to realize that it is okay to think of an idea and be sure that Keras will have your back!

But again, there is an inquisitive mind that keeps on asking – What is a Neural Network? What is a Recurrent Neural Network? What is LSTM? How Keras is so cool? Right? I definitely need to understand how the API actually works and that is a post for some other day.

So, today, I am going to work on Generating Text or a Story using Keras LSTM. Let’s get started!

DATA

To build a Text Generating Model, we need some text data. A good thing is that now we can download the classics for free and use them in creating generative models. Perhaps the best place to get access to free books that are no longer protected by copyright is Project Gutenberg.

I will use “Pride And Prejudice” By Jane Austen as the Dataset for this project. You can download any book you like and you just need to save it as a text (.txt) file and delete the Gutenberg header and footer embedded in the text. Or, you can download the Dataset that I have used from my Github.

Text Generation using Pride and Prejudice Text

Now, as we have the Data ready to use, we can start our project.

Develop a Text Generating Model using Keras LSTM

We will develop a simple LSTM Network to learn sequences of characters from Pride and Prejudice. Then, we will use this model to generate new sequences of characters.

Import Libraries

First, import all the Libraries required for this project.

 ##Import Keras
 from keras.models import Sequential
 from keras.layers import Dense
 from keras.layers import Dropout
 from keras.layers import LSTM
 from keras.utils import np_utils
 ##Import Other Libraries
 import numpy as np
 import pandas as pd
Load Data

Now, we need to load the text data and convert all the text in lowercase to reduce the vocabulary that the network must learn.

df_text=(open("PrideAndPrejudice.txt").read())
df_text=df_text.lower()
Data Preprocessing

We cannot model the characters directly. So, instead, we will convert the characters to integers. Now, we can do this easily by first creating a set of all of the distinct characters in the book and then creating a map of each character to a unique integer.

characters = sorted(list(set(df_text)))
print(characters)

For example, the list of unique sorted lowercase characters in the book is as follows:

In order to be able to use the textual data with an RNN we will do the below steps:

  1. Transform the text data to numeric values.
  2. Create a sequence of characters as our X value and use the following character as our Y value.
  3. Lastly, transform our data into an array of booleans.

Create Character-Number Mapping
char_to_n = {char:n for n, char in enumerate(characters)
Data Preprocessing

Now, to make the data ready which we can use to train our model, we will split our data up into subsequences with a length of 100 characters. Then we will transform our data into a boolean array.

X = []
Y = []
length = len(df_text)
seq_length = 100
for i in range(0, length-seq_length, 1):
    sequence = df_text[i:i + seq_length]
    label =df_text[i + seq_length]
    X.append([char_to_n[char] for char in sequence])
    Y.append(char_to_n[label])

Now, let’s take a look at the above piece of code.

X: Training array

Y: Target array

seq_length:  Length of the sequence of characters that we want to consider before predicting a particular character.

The for loop is used to iterate over the entire length of the text and create such sequences (stored in X) and their true values (stored in Y). At this point, X looks like below:

Next, we need to rescale the integers to the range 0-to-1 to make the patterns easier to learn by the LSTM network that uses the sigmoid activation function by default.

X_modified = np.reshape(X, (len(X), seq_length, 1))
X_modified = X_modified / float(len(characters))
Y_modified = np_utils.to_categorical(Y)

print("X: ",X_modified[0])
print("Y: ",Y_modified[0])

Let’s take a look at our Training array which is now ready to be fed into the LSTM model.

And, this is how the target array looks like at this point.

Build a Basic Model

Now, our data is all set! We will use the entire training dataset to learn the probability of each character in a sequence.

We will build a sequential model with two LSTM layers having 200 units each. The first layer needs to be fed in with the input shape. In order for the next LSTM layer to be able to process the same sequences, we enter the return_sequences parameter as True.

Also, dropout layers with a 20% dropout have been added to check for over-fitting. So, the last layer outputs a one-hot encoded vector which gives the character output.

So, let’s first create a very basic Model and take a look at an example.

model = Sequential()
model.add(LSTM(200, input_shape=(X_modified.shape[1], X_modified.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(200))
model.add(Dropout(0.2))
model.add(Dense(Y_modified.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Fit the Basic Model

This may take some time! For me, it took a few hours to train the basic model.

model.fit(X_modified, Y_modified, epochs=1, batch_size=100)
Save the Model and Load the Model

The model took some time to build. That’s why, its better to save the model and load it while generating some text example.

model.save_weights('/Users/oindrilasen/WORK_AREA/Data Science/Projects/LSTM_Text_Generator/models/text_generator_basic_model.h5')

model.load_weights('/Users/oindrilasen/WORK_AREA/Data Science/Projects/LSTM_Text_Generator/models/text_generator_basic_model.h5')
Generate Text

Finally, it is time to see some results!

We will check for a random row from our X array that is an array of 100 characters. After this, we target predicting another 100 characters following X.

Now, while preparing the mapping of unique characters to integers, we must also create a reverse mapping that we can use to convert the integers back to characters so that we can understand the predictions.

n_to_char = dict((i, c) for i, c in enumerate(characters))

The input is reshaped and scaled as previously and the next character with maximum probability is predicted.

string_mapped = X[0]
full_string = [n_to_char[value] for value in string_mapped]
full_string
##generating characters
for i in range(400):
     x = np.reshape(string_mapped,(1,len(string_mapped), 1))
     x = x / float(len(characters))
     pred_index = np.argmax(model.predict(x, verbose=0))
     seq = [n_to_char[value] for value in string_mapped]
     full_string.append(n_to_char[pred_index])
     string_mapped.append(pred_index)
     string_mapped = string_mapped[1:len(string_mapped)]

seq is used to store the decoded format of the string that has been predicted till now. Next, the new string is updated, such that the first character is removed and the new predicted character is included.

Results

Okay, I must admit, I was expecting a remarkably better output. But the basic model did a very humiliating job here.

Let’s take a look at the text that Jane Austen wrote:

print(df_text[:200])

chapter 1

it is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife. however little known the feelings or views of such a



Pride and Prejudice – Jane Austen

Now, let’s take a look at what our text generator wrote:

 txt=""
 for char in full_string:
     txt = txt+char
 txt

chapter 1

it is a truth universally acknowledged, that a single man in

possession of a lere the sase the sase the sase the sas to the

she sooe the sase the sase the sase the sase the sas to the

pere the sas to the sase the sase the sase the sase the was

“i sas to the sase the sase the sase the sase the sas to the\n

pere the sas to the sase the sase the sase the sase the was

“i sas to the sase the sase the sase the sase the sas to the\n

pere the sas

Honestly, I have no idea what language it is. Let’s check one for more time for a random seed.

##Load LSTM network and generate text
 import sys
 ##pick a random seed
 start = np.random.randint(0, len(X)-1)
 print(start)
 pattern = X[start]
 n_vocab = len(characters)
 print("Seed:")
 print( "\"", ''.join([n_to_char[value] for value in pattern]), "\"")
 ##generate characters
 for i in range(1000):
     x = np.reshape(pattern, (1, len(pattern), 1))
     x = x / float(n_vocab)
     prediction = model.predict(x, verbose=0)
     index = np.argmax(prediction)
     result = n_to_char[index]
     seq_in = [n_to_char[value] for value in pattern]
     sys.stdout.write(result)
     pattern.append(index)
     pattern = pattern[1:len(pattern)]
     print ("\nDone")

Not at all impressive, right? Okay, I need a deeper/wider/larger model! How about an enormous model?

An Enormous Model to generate text using Keras LSTM

When I started running this model, I realized I really need a new machine. It showed me an ETA of 60 Hours! So, I will update the marvelous creation by my text generator 60 hours later, provided the program actually ends.

Thank You for reading!

2

Leave a Reply

Your email address will not be published. Required fields are marked *