Data Description
The Airline Dataset contains 15 features describing about passenger personal details, flight details, and flight performance. Below is a detailed description of each feature.
Feature Variables
-
Passenger ID: int64
- Unique identifier for passengers.
-
First Name: string
- The first name of the passenger.
-
Last Name: string
- The last name of the passenger.
-
Gender: string
- Values are categorical (Male, Female).
- Transformation: May be encoded as binary (e.g., Male = 0, Female = 1) or one-hot encoding for analysis.
-
Age: int64
- Age of the passenger.
- Range: [1, 90]
- Min: 1 , Max: 90, Mean: 46.25
-
Nationality: string
- The country of origin for the passenger.
-
Airport Name: string
- Name of the originating or destination airport.
-
Airport Country Code: string
- Country code of the airport (e.g., USA, CANADA, FRANCE etc).
-
Country Name: string
- Full country name corresponding to the airport.
-
Airport Continent: string
- Values include NAM (North America), EU (Europe), etc.
- Transformation: May be encoded as categorical numerical values or one-hot encoded.
-
Continents: string
- Generalized continent grouping.
- Transformation: Similar to Airport Continent, encode numerically if necessary.
-
Departure Date: string
- The date of the flight's departure.
- Transformation: Convert to a
datetime
format for time-based analysis.
-
Arrival Airport: string
- Code of the destination airport.
-
Pilot Name: string
- Name of the pilot for the flight.
-
Flight Status: string
- Categorical values (On Time, Delayed).
- Transformation: Encode numerically for machine learning (e.g., On Time = 0, Delayed = 1).
Target Variables
As per the the Airline dataset the target variables can Flight status.
Flight Status: Categorical -> Binary
- The primary target variable, indicating whether the flight was on time or delayed.
- It helps to understand how well the airline operates and what causes delays.
- Transformation: Convert to binary values for classification models with only two possible outcomes. (e.g., On Time = 0, Delayed = 1).
Application
- Operational Analysis: Helps identify patterns in delays based on flight details like departure time, airport, or pilot name.
- Predictive Modelling: Make it easier/possible to create models based on predicts delays, helps to improve customer comfort, planning and experience.
The factors that can influence flight status can be the more busy airport the more delay it can be, flights during peak hours, weather conditions.
Data Transformation
There were no transformations appear to have been performed on the dataset.
Yes, there might be several opportunities for data transformation in our dataset. Here are some suggestion:
- Departure Date: - Extract components like Year, Month, Day of the week and convert to a datetime format for analysis.
- Flight Status: - Encode as numerical values. (example., 0 = Delayed, 1 = On time, 2 = Cancelled).
-
First Name
andLast Name
could be dropped unless you are performing text or name-based analysis. - Merge
Airport Name
andAirport Country Code
into a single feature for uniqueness.
Here are some suggestions for Data transformation if there might be possible in our Airline Dataset.