ggplot2::Scatter Plot in R using Titanic Dataset

A scatter plot is usually used to understand the nature of the relationship between two continuous variables. Thus, a scatter plot combines the values of two quantitative variables in a data set and display them as geometric points inside a Cartesian graph.

Problem:

Create a Scatter Plot in R and gradually add layers to it

Solution:

We will use the ggplot2 library to create our first Scatter Plot and the Titanic Dataset. The Data is first loaded and cleaned and the code for the same is posted here.

Now, let’s have a look at our current clean titanic dataset.

Now, let’s plot Age and Fare attributes to find a relationship. Before that, let’s have a look at the data for these two variables.

So, the Age of the passengers varies from 0 to 80 and the fare ranges from $0 to $512.00.
Now, let’s plot these two variables.

Interestingly, the above graphical representation answers some questions and even helps us to raise further questions. For Example, what fascinating facts we are seeing here?

  • Passengers above Age 60 are quite less.
  • Most of the passengers paid a fare <100 with a few exceptions.
  • A very few numbers of passengers of Age above 30 and less than 40 have paid a fare more than $500
Now, let’s see how the Sex of the passengers is distributed in the above plot.

Thus, one argument of aes() is the shape of the points. There are a finite number of shapes that ggplot() can automatically assign to the points. But, we need to pass a Categorical Variable to the Shape attribute.  Okay, so how about some color?

Again, another useful argument of aes() is the color of the points and we need to pass a categorical variable here. We can even pass size or alpha argument in the aes() to modify the look and feel.
We can use geom_smooth() function to add a smooth line over the points. Now, let’s use the geom_smooth() and add a title to our scatter plot example.

1

3 comments

Leave a Reply

Your email address will not be published. Required fields are marked *