ggplot2::Bar Plot in R using the Titanic Dataset

A Barplot is the graphical representation of categorical data with some rectangular bars whose height is proportional to the value that they represent. Barplots are useful to represent two different things:

The count of cases for each group – The height of the bar will represent the count of cases. This is done by using stat = “bin” (which is the default). It is incompatible with mapping values to the “Y” aesthetic.

The value of a column in the dataset – The height of the bar will represent the value in a column of the data frame. This is done with stat = “identity”.

Problem:
Create a Barplot in R using the Titanic Dataset
Solution:
We will use the ggplot2 library to create our Bar Plot and the Titanic Dataset. The Data is first loaded and cleaned and the code for the same is posted here.
Now, let’s have a look at our current clean titanic dataset.
Now, let’s plot the count of passengers who survived the Titanic disaster. For Barplots using the ggplot2 library, we will use geom_bar() function to create bar plots.

We will use ggtitle() to add a title to the Barplot. Also, xlab() and ylab() can be used to modify the Labels in X and Y axes respectively. coord_flip() function can flip the X axis and Yaxis.

Bar charts are automatically stacked when multiple bars are placed at the same location. We can instead pass position = “dodge” and modify the look.

If we pass a variable in Y aesthetic, it throws an error as below:
Error:
stat_count() must not be used with a y aesthetic.

That means, if we are trying to plot the Count of the variable, we should not pass the Y aesthetic value and the argument in geom_bar is by default stat = “bin”.

Now, let’s use Barplot to represent the Value of a variable. In this case, we need to pass both the X and Y aesthetic value, and in geom_bar() we need to pass the argument stat = “identity”.

So, here we are plotting the ticket Fare the passengers paid for different Classes and below graph is the result.

The graph clearly displays that the passengers in the 1st class who paid more Fare had a better survival rate than the passengers in the 2nd class and 3rd class. Thus, it was our obvious hypothesis and the plot has proved the same.

Now, let’s try to plot two categorical variables “Sex” of the passenger and their “Survival” ratio.

From the above plot, it looks like the Female passengers have a better survival ratio. Even, the below data proves the same.

Thank You!
3

2 comments

Leave a Reply

Your email address will not be published. Required fields are marked *