Data Description
The Airline Dataset contains 15 features describing about passenger personal details, flight details, and flight performance. Below is a detailed description of each feature.
Feature Variables
-
Passenger ID: int64
- Unique identifier for passengers.
-
First Name: string
- The first name of the passenger.
-
Last Name: string
- The last name of the passenger.
-
Gender: string
- Values are categorical (Male, Female).
- Transformation: May be encoded as binary (e.g., Male = 0, Female = 1) or one-hot encoding for analysis.
-
Age: int64
- Age of the passenger.
- Range: [1, 90]
- Min: 1 , Max: 90, Mean: 46.25
-
Nationality: string
- The country of origin for the passenger.
-
Airport Name: string
- Name of the originating or destination airport.
-
Airport Country Code: string
- Country code of the airport (e.g., USA, CANADA, FRANCE etc).
-
Country Name: string
- Full country name corresponding to the airport.
-
Airport Continent: string
- Values include NAM (North America), EU (Europe), etc.
- Transformation: May be encoded as categorical numerical values or one-hot encoded.
-
Continents: string
- Generalized continent grouping.
- Transformation: Similar to Airport Continent, encode numerically if necessary.
-
Departure Date: string
- The date of the flight's departure.
- Transformation: Convert to a
datetime
format for time-based analysis.
-
Arrival Airport: string
- Code of the destination airport.
-
Pilot Name: string
- Name of the pilot for the flight.
-
Flight Status: string
- Categorical values (On Time, Delayed).
- Transformation: Encode numerically for machine learning (e.g., On Time = 0, Delayed = 1).
Target Variables
As per the the Airline dataset the target variables can Flight status.
Flight Status: Categorical -> Binary
- The primary target variable, indicating whether the flight was on time or delayed.
- It helps to understand how well the airline operates and what causes delays.
- Transformation: Convert to binary values for classification models with only two possible outcomes. (e.g., On Time = 0, Delayed = 1).
Application
- Operational Analysis: Helps identify patterns in delays based on flight details like departure time, airport, or pilot name.
- Predictive Modelling: Make it easier/possible to create models based on predicts delays, helps to improve customer comfort, planning and experience.
The factors that can influence flight status can be the more busy airport the more delay it can be, flights during peak hours, weather conditions.