|
|
---
|
|
|
title: Data
|
|
|
---
|
|
|
# Data Description
|
|
|
|
|
|
The Airline Dataset contains 15 features describing about passenger personal details, flight details, and flight performance. Below is a detailed description of each feature.
|
|
|
|
|
|
# Feature Variables
|
|
|
|
|
|
1. **Passenger ID:** _int64_
|
|
|
* Unique identifier for passengers.
|
|
|
2. **First Name**: _string_
|
|
|
* The first name of the passenger.
|
|
|
3. **Last Name**: _string_
|
|
|
* The last name of the passenger.
|
|
|
4. **Gender**: _string_
|
|
|
* Values are categorical (Male, Female).
|
|
|
* Transformation: May be encoded as binary (e.g., Male = 0, Female = 1) or one-hot encoding for analysis.
|
|
|
5. **Age**: _int64_
|
|
|
* Age of the passenger.
|
|
|
* Range: \[1, 90\]
|
|
|
* Min: 1 , Max: 90, Mean: 46.25
|
|
|
6. **Nationality**: _string_
|
|
|
* The country of origin for the passenger.
|
|
|
7. **Airport Name**: _string_
|
|
|
* Name of the originating or destination airport.
|
|
|
8. **Airport Country Code**: _string_
|
|
|
* Country code of the airport (e.g., USA, CANADA, FRANCE etc).
|
|
|
9. **Country Name**: _string_
|
|
|
* Full country name corresponding to the airport.
|
|
|
10. **Airport Continent**: _string_
|
|
|
* Values include NAM (North America), EU (Europe), etc.
|
|
|
* Transformation: May be encoded as categorical numerical values or one-hot encoded.
|
|
|
11. **Continents**: _string_
|
|
|
* Generalized continent grouping.
|
|
|
* Transformation: Similar to **Airport Continent**, encode numerically if necessary.
|
|
|
12. **Departure Date**: _string_
|
|
|
* The date of the flight's departure.
|
|
|
* Transformation: Convert to a `datetime` format for time-based analysis.
|
|
|
13. **Arrival Airport**: _string_
|
|
|
* Code of the destination airport.
|
|
|
14. **Pilot Name**: _string_
|
|
|
* Name of the pilot for the flight.
|
|
|
15. **Flight Status**: _string_
|
|
|
* Categorical values (On Time, Delayed).
|
|
|
* Transformation: Encode numerically for machine learning (e.g., On Time = 0, Delayed = 1).
|
|
|
|
|
|
# Target Variables
|
|
|
|
|
|
As per the the Airline dataset the target variables can Flight status.
|
|
|
|
|
|
**Flight Status**: _Categorical -\> Binary_
|
|
|
|
|
|
* The primary target variable, indicating whether the flight was on time or delayed.
|
|
|
* It helps to understand how well the airline operates and what causes delays.
|
|
|
* Transformation: Convert to binary values for classification models with only two possible outcomes. (e.g., On Time = 0, Delayed = 1).
|
|
|
|
|
|
##### Application
|
|
|
|
|
|
* **Operational Analysis:** Helps identify patterns in delays based on flight details like departure time, airport, or pilot name.
|
|
|
* **Predictive Modelling:** Make it easier/possible to create models based on predicts delays, helps to improve customer comfort, planning and experience.
|
|
|
|
|
|
The factors that can influence flight status can be the more busy airport the more delay it can be, flights during peak hours, weather conditions.
|
|
|
|
|
|
# Data Transformation |
|
|
\ No newline at end of file |