Assignment on Data Manipulation in R

Let us take a sample dataset and clean the data with the concepts that we learned in Dplyr and Tidyr package.

Problem:

Use Dplyr and Tidyr package to clean the NYC 2013 Flights dataset.
Solution:
For this assignment, we can use the data sets related to flights that departed from NYC in 2013.

Flights

Below is the code that I have used to clean the dataset:

library(nycflights13)
library(tidyr)
library(dplyr)
# Check the Dataset flights
View(flights)
#-----------------------------------------------------------------
# Use Unite() to merge month, day and year columns into single column called fly_date
clean_flights <- unite(flights,fly_date, month,day,year, sep = "-")
#-----------------------------------------------------------------
# Use arrange() to order data by origin and destination
clean_flights <- arrange(clean_flights,origin, dest)
#-----------------------------------------------------------------
# Use mutate() to remove the columns which are not required
clean_flights <- mutate(clean_flights,dep_delay = NULL,
arr_delay= NULL,
air_time = NULL,
time_hour = NULL)
# Use mutate() to add a new column "Total_Time"
clean_flights <- mutate(clean_flights,total_time = hour* 60 + minute)
# Use mutate() to add a new column "Speed"
clean_flights <- mutate(clean_flights, speed = distance/total_time * 60)
#-----------------------------------------------------------------
# Use filter() to select flights from EWR to SFO which took less than 10 hours
clean_flights <- filter(clean_flights,origin == "EWR", dest == "SFO", hour < 10)
# Order Data in ascending order of Total_time
clean_flights <- arrange(clean_flights,total_time)
View(clean_flights)

Below is the output dataset:

Clean_Flights
0

Leave a Reply

Your email address will not be published. Required fields are marked *