Requirements:
This assignment uses on-time performance of airline flights data from the Research and Innovative Technology Administration, Bureau of Transportation Statistics (RITA) to determine the number of flights cancelled, grouped by flight cancellation reason. The data is available for download and analysis at http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=OnTimeYou can download lookup data related to cancellation codes and descriptions at http://www.transtats.bts.gov/Download_Lookup.asp?Lookup=L_CANCELLATION Tasks:
Create a job to get cancelled flights where Cancelled is 1
Create a job to group the data by the cancellation code, and count the number of flights in each group
Create a job to sort the data set by Year, then Carrier, and then FlightNum. And also use Month to partition them into different reducers
Create a job to join Cancellation Codes with Cancelled flight data set
Follow Us