Passenger Prediction: How many estimated passengers in a given month?

We predict the number of the passengers on a flight on a month. Use Cases:

  • Increase advertising budget for that month.
  • Increase custom promotional message targeted for frequent flyers in the given month.
  • Include discounts, packages to cater more users.

In [119]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LinearRegression
import seaborn as sns
import os
In [12]:
# Loading dataset
df = sns.load_dataset("flights")
print ("Total entries: ", len(df))
df.head()
Total entries:  144
Out[12]:
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
  • Given year, month, passengers in the dataset.
  • We have to predict passengers (target) variable

Exploring the data

In [19]:
df.groupby(["year", "passengers"]).first()
Out[19]:
month
year passengers
1949 104 Nov
112 Jan
118 Feb
119 Oct
121 May
... ... ...
1960 472 May
508 Sep
535 Jun
606 Aug
622 Jul

135 rows × 1 columns

  • Given passengers by month and year.
In [28]:
df.groupby("month")['passengers'].sum().plot(kind="bar")
Out[28]:
<AxesSubplot:xlabel='month'>
  • Given total passenger count by month.
In [34]:
df.groupby("month")['passengers'].max().plot(kind="bar")
print ("Max passenger count: ", max(df.groupby("month")['passengers'].max()))
Max passenger count:  622
  • Maximum passenger count in Jul, Aug
  • Explains summer vacation role in flight travel.
In [54]:
df.groupby("month")['passengers'].sum().plot(kind="bar")
Out[54]:
<AxesSubplot:xlabel='month'>
  • A more broader view of the max, min, mean data
In [55]:
df.groupby("month")['passengers'].min().plot(kind="bar")
Out[55]:
<AxesSubplot:xlabel='month'>
In [65]:
df.groupby(["year"])['passengers'].sum().plot(kind="bar")
Out[65]:
<AxesSubplot:xlabel='year'>
  • Number of passengers by year.
In [110]:
df.plot(x="month", y="passengers", kind="line")
/opt/anaconda3/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py:1235: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels(xticklabels)
Out[110]:
<AxesSubplot:xlabel='month'>
  • A line graph showing the passenger fluncations

Passenger prediction

In [118]:
df
Out[118]:
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
... ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432

144 rows × 3 columns

  • Changing month to numeric value
In [120]:
le = LabelEncoder()
df.month = le.fit_transform(df['month'])
In [121]:
df.head()
Out[121]:
year month passengers
0 1949 4 112
1 1949 3 118
2 1949 7 132
3 1949 0 129
4 1949 8 121
In [122]:
features = df.drop(columns=['passengers']).values
labels = df.passengers.values
In [123]:
lm = LinearRegression()
In [124]:
lm.fit(features, labels)
Out[124]:
LinearRegression()
In [125]:
lm.score(features, labels)
Out[125]:
0.8509730709742956
In [131]:
print("Predicted passenger count for Jan, 2020: ",lm.predict(np.array([[2020, 1]])))
Predicted passenger count for Jan, 2020:  [2376.74703768]