Let’s think of a fictional scenario. You were looking for a change in your job. You started navigating through a job portal. Suddenly, a job description caught your attention. It looked like a perfect fit! It seemed that the role was created just for you only! Quickly, you uploaded your resume to apply. You were so sure that you would get a call soon for an interview. But unfortunately, that call never came! Did that ring a bell? Did that happen to you? For me, it’s “Been there done that”! (a lot!)
Well, finding a job is a complex process. There are hurdles at different levels. Sadly, I never realized why I am not getting the first call until a few days back when I called the recruiter immediately after I submitted for a role. She said, “Your profile is a 36% match to this role and I will call you back!” I was not sure what she was talking about! Then I came to know that our resumes are scanned by an automated NLP program even before reaching a hiring manager or even before catching a human eye. So I learned that the name of the first obstacle is Applicant Tracking System (ATS).
I am sure different companies use resume scanners of different complexities. I want a simple one – my very own resume scanner. So this post is all about creating your own resume scanner – A program to see how well your resume matches a specific job description.
Approach:
I want to create a Python program that will return the percentage % match between a resume and a job description. Also, I will create a word cloud using the Job description so that we get a clear view of all the important keywords.
Install & Import Libraries:
First, we are going to import all the libraries required for this project.
Now, Resumes do not have a fixed file format, and hence they can be in any file format such as .pdf or .doc, or .docx. So our first challenge is to read the resume and convert it to plain text. For this, we can use two Python modules: pdfminer and doc2text. These modules help extract text from.pdf or .doc, or .docx fle formats.
pip install pdfminer pip install docx2txt
Let’s import all the libraries required for this project.
import io
from pdfminer.converter import TextConverter
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfpage import PDFPage
#Docx resume
import docx2txt
#Wordcloud
import re
import operator
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
set(stopwords.words('english'))
from wordcloud import WordCloud
from nltk.probability import FreqDist
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
Reading the Resume:
Here, I will create two different functions. One to read resumes in pdf file format. Another one to read in .doc format. Both of the function will return the text in the Resume.
Read PDF Resume:
def read_pdf_resume(pdf_doc):
resource_manager = PDFResourceManager()
fake_file_handle = io.StringIO()
converter = TextConverter(resource_manager, fake_file_handle)
page_interpreter = PDFPageInterpreter(resource_manager, converter)
with open(pdf_doc, 'rb') as fh:
for page in PDFPage.get_pages(fh, caching=True,check_extractable=True):
page_interpreter.process_page(page)
text = fake_file_handle.getvalue()
# close open handles
converter.close()
fake_file_handle.close()
if text:
return text
Read word Resume:
def read_word_resume(word_doc):
resume = docx2txt.process(word_doc)
resume = str(resume)
#print(resume)
text = ''.join(resume)
text = text.replace("\n", "")
if text:
return text
Create a Wordcloud with Keywords
How about a graphical image which will display the keywords in the Job Description? I am always a big fan of Word Clouds. If you are scanning a job description you may miss a few skills that the role demands. May be you have some experience in those skills and did not remember to add in your resume. Thus, a word cloud will flash those keyoword for a quick review.
Clean the Job Description:
To create a word cloud I usually clean the text first to avoid word repetitions or punctuations or numbers because those doesn’t make much sense in a word cloud.
def clean_job_decsription(jd):
''' a function to create a word cloud based on the input text parameter'''
## Clean the Text
# Lower
clean_jd = jd.lower()
# remove punctuation
clean_jd = re.sub(r'[^\w\s]', '', clean_jd)
# remove trailing spaces
clean_jd = clean_jd.strip()
# remove numbers
clean_jd = re.sub('[0-9]+', '', clean_jd)
# tokenize
clean_jd = word_tokenize(clean_jd)
# remove stop words
stop = stopwords.words('english')
clean_jd = [w for w in clean_jd if not w in stop]
return(clean_jd)
Create a word cloud:
Now, it’s time to create the image.
def create_word_cloud(jd):
corpus = jd
fdist = FreqDist(corpus)
#print(fdist.most_common(100))
words = ' '.join(corpus)
words = words.split()
# create a empty dictionary
data = dict()
# Get frequency for each words where word is the key and the count is the value
for word in (words):
word = word.lower()
data[word] = data.get(word, 0) + 1
# Sort the dictionary in reverse order to print first the most used terms
dict(sorted(data.items(), key=operator.itemgetter(1),reverse=True))
word_cloud = WordCloud(width = 800, height = 800,
background_color ='white',max_words = 500)
word_cloud.generate_from_frequencies(data)
# plot the WordCloud image
plt.figure(figsize = (10, 8), edgecolor = 'k')
plt.imshow(word_cloud,interpolation = 'bilinear') plt.axis("off") plt.tight_layout(pad = 0)
plt.show()
Get Job Description and Resume Match Score
Now, we are at the final part of our project. To get a score of how the resume matches a specific job description, I am going to use a Cosine Similarity metric. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. The smaller the angle, the higher the cosine similarity. In this context, the two vectors are arrays containing the words of two documents.
Now, a commonly used approach to matching similar documents is based on counting the maximum number of common words between the documents. But there is a problem with this approach. As the size of the document increases, the number of common words tend to increase even if the documents talk about different topics.
The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance because of the size (like, the word ‘python’ appeared 50 times in one document and 2 times in another) they could still have a smaller angle between them. Thus, smaller the angle, higher the similarity.
Okay, so lets create a function to find the match score!
def get_resume_score(text):
cv = CountVectorizer(stop_words='english')
count_matrix = cv.fit_transform(text)
#Print the similarity scores
print("\nSimilarity Scores:")
#get the match percentage
matchPercentage = cosine_similarity(count_matrix)[0][1] * 100
matchPercentage = round(matchPercentage, 2) # round to two decimal
print("Your resume matches about "+ str(matchPercentage)+ "% of the job
description.")
Test Resume Scanner:
Finally, it is time to get a score! I am using my personal resume and copied it in the same folder so that it can be read by this program. Now, let me get a Job Description from a Job portal. I took a Data Analyst Job Description and let’s see how well my profile matches with this specific role.
What you'll do: The role involves partnering very closely with multiple PMs, Engineers, Test Managers and Business Partner to elevate the site experience for the verticals on Walmart. Analyze click stream data to understand how customers are interacting with the site. Uncover user pain points and help in building inspirational experiences. Provide and supports the implementation of product solutions Provide data driven insights and deliver recommendations that address opportunities for product improvements Provide analytical support to Product Managers Ensure accuracy of data capture strategy A/B Test: Test variations on messaging or features. Display dashboards: Visualize data with templated or custom reports. Create effective reporting and dashboards. Measure: Measure engagement by feature A self-starter: Can drive projects with minimal guidance Strong communicator: You effectively synthesize, visualize, and communicate your ideas to others You’ll sweep us off our feet if… You’re able to use metrics to improve performance You’re excited about solving complex challenges You’re customer-centric in spirit and in execution You’re comfortable influencing others, leading teams, managing stakeholders, and communicating clearly You have a test and learn mentality and an agile way of working to improve your product
Let’s run all the functions created above and get a score!
if name == 'main':
extn = input("Enter File Extension: ")
#print(extn)
if extn == "pdf":
resume = read_pdf_resume('Resume_OindrilaSen.pdf')
else:
resume = read_word_resume('test_resume.docx')
job_description = input("\nEnter the Job Description: ")
## Get a Keywords Cloud
clean_jd = clean_job_decsription(job_description)
create_word_cloud(clean_jd) text = [resume, job_description]
## Get a Match score
get_resume_score(text)
My Goodness!
Similarity Scores: Your resume matches about 26.82% of the job description.
Today, I got an answer for all my speculations. So, the takeaway for today is if a job description looks like a good fit, I need to run this program and check where my resume stands. Thus the resume scanner can tell you a story – a real one!
I have uploaded my Jupyter Notebook for resume scanner program in my Github.
Also, if you are looking for some other project ideas, take a look at my below projects:
Deep Learning Model to Generate Text using Keras LSTM
Build and deploy a multi-page Flask application on Heroku
Text Analytics on #coronavirus trends in Twitter using Python
Thank You for reading this article. I hope it’s helpful to you all! If you enjoyed this article and found it helpful please leave some claps to show your appreciation.
Thank You!
26