Skip to content

Instantly share code, notes, and snippets.

@23ccozad
Last active April 13, 2025 02:11
Show Gist options
  • Select an option

  • Save 23ccozad/04cac05ea183993054acde237b5a590d to your computer and use it in GitHub Desktop.

Select an option

Save 23ccozad/04cac05ea183993054acde237b5a590d to your computer and use it in GitHub Desktop.
flight-delay-data-collection.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"gpuType": "V28",
"mount_file_id": "1X7jKjrCHmcJsiz2ZZOybU-cvZgKObuPq",
"authorship_tag": "ABX9TyP9OfqdzzT0B8rWib6M6N/F",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "TPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/23ccozad/04cac05ea183993054acde237b5a590d/flight-delay-data-collection.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Improving Predictions of Commercial Flight Delays Using Weather Data\n",
"\n",
"#### DTSA 5506: Data Mining Project\n",
"\n",
"In this first notebook, we'll be focused on **data collection**. We will gather the flight and weather data, and combine them into a single DataFrame. In the next notebook, we'll explore the data further and train models."
],
"metadata": {
"id": "qlZIiKEoH1ZS"
}
},
{
"cell_type": "markdown",
"source": [
"First, let's import all the libraries we need."
],
"metadata": {
"id": "6Cg5OuGmIrvq"
}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"from zoneinfo import ZoneInfo\n",
"import pytz"
],
"metadata": {
"id": "5HejOJhdzwkH"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Import flight data"
],
"metadata": {
"id": "-dNlCDfHIyyC"
}
},
{
"cell_type": "markdown",
"source": [
"Our flight data contains domestic US flights from 2015. This dataset comes from the U.S. Department of Transportation, and was downloaded from [Kaggle](https://www.kaggle.com/datasets/usdot/flight-delays/data?select=flights.csv) before being imported here."
],
"metadata": {
"id": "qzQ_KCXivl7Y"
}
},
{
"cell_type": "code",
"source": [
"flights_data_path = '/content/drive/MyDrive/CU Boulder Data Science/DTSA 5506 - Data Mining Project/data/flights.csv'\n",
"dtypes = {7: str, 8: str, 9: str, 10: str, 13: str, 18: str, 20: str, 21: str} # Some columns need to be specified to enter as strings, not integers\n",
"flights = pd.read_csv(flights_data_path, dtype=dtypes)"
],
"metadata": {
"id": "zoGFBZ8d02qo"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"flights"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 444
},
"id": "eMvoQaAt1Sgq",
"outputId": "ca44aed4-3d58-4589-d209-dbb6905c0c1f"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER \\\n",
"0 2015 1 1 4 AS 98 N407AS \n",
"1 2015 1 1 4 AA 2336 N3KUAA \n",
"2 2015 1 1 4 US 840 N171US \n",
"3 2015 1 1 4 AA 258 N3HYAA \n",
"4 2015 1 1 4 AS 135 N527AS \n",
"... ... ... ... ... ... ... ... \n",
"5819074 2015 12 31 4 B6 688 N657JB \n",
"5819075 2015 12 31 4 B6 745 N828JB \n",
"5819076 2015 12 31 4 B6 1503 N913JB \n",
"5819077 2015 12 31 4 B6 333 N527JB \n",
"5819078 2015 12 31 4 B6 839 N534JB \n",
"\n",
" ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE ... \\\n",
"0 ANC SEA 0005 ... \n",
"1 LAX PBI 0010 ... \n",
"2 SFO CLT 0020 ... \n",
"3 LAX MIA 0020 ... \n",
"4 SEA ANC 0025 ... \n",
"... ... ... ... ... \n",
"5819074 LAX BOS 2359 ... \n",
"5819075 JFK PSE 2359 ... \n",
"5819076 JFK SJU 2359 ... \n",
"5819077 MCO SJU 2359 ... \n",
"5819078 JFK BQN 2359 ... \n",
"\n",
" ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED CANCELLATION_REASON \\\n",
"0 0408 -22.0 0 0 NaN \n",
"1 0741 -9.0 0 0 NaN \n",
"2 0811 5.0 0 0 NaN \n",
"3 0756 -9.0 0 0 NaN \n",
"4 0259 -21.0 0 0 NaN \n",
"... ... ... ... ... ... \n",
"5819074 0753 -26.0 0 0 NaN \n",
"5819075 0430 -16.0 0 0 NaN \n",
"5819076 0432 -8.0 0 0 NaN \n",
"5819077 0330 -10.0 0 0 NaN \n",
"5819078 0442 2.0 0 0 NaN \n",
"\n",
" AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY LATE_AIRCRAFT_DELAY \\\n",
"0 NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN \n",
"... ... ... ... ... \n",
"5819074 NaN NaN NaN NaN \n",
"5819075 NaN NaN NaN NaN \n",
"5819076 NaN NaN NaN NaN \n",
"5819077 NaN NaN NaN NaN \n",
"5819078 NaN NaN NaN NaN \n",
"\n",
" WEATHER_DELAY \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"... ... \n",
"5819074 NaN \n",
"5819075 NaN \n",
"5819076 NaN \n",
"5819077 NaN \n",
"5819078 NaN \n",
"\n",
"[5819079 rows x 31 columns]"
],
"text/html": [
"\n",
" <div id=\"df-0c1ff911-5f60-4e29-9205-240e7e591a04\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>YEAR</th>\n",
" <th>MONTH</th>\n",
" <th>DAY</th>\n",
" <th>DAY_OF_WEEK</th>\n",
" <th>AIRLINE</th>\n",
" <th>FLIGHT_NUMBER</th>\n",
" <th>TAIL_NUMBER</th>\n",
" <th>ORIGIN_AIRPORT</th>\n",
" <th>DESTINATION_AIRPORT</th>\n",
" <th>SCHEDULED_DEPARTURE</th>\n",
" <th>...</th>\n",
" <th>ARRIVAL_TIME</th>\n",
" <th>ARRIVAL_DELAY</th>\n",
" <th>DIVERTED</th>\n",
" <th>CANCELLED</th>\n",
" <th>CANCELLATION_REASON</th>\n",
" <th>AIR_SYSTEM_DELAY</th>\n",
" <th>SECURITY_DELAY</th>\n",
" <th>AIRLINE_DELAY</th>\n",
" <th>LATE_AIRCRAFT_DELAY</th>\n",
" <th>WEATHER_DELAY</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AS</td>\n",
" <td>98</td>\n",
" <td>N407AS</td>\n",
" <td>ANC</td>\n",
" <td>SEA</td>\n",
" <td>0005</td>\n",
" <td>...</td>\n",
" <td>0408</td>\n",
" <td>-22.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AA</td>\n",
" <td>2336</td>\n",
" <td>N3KUAA</td>\n",
" <td>LAX</td>\n",
" <td>PBI</td>\n",
" <td>0010</td>\n",
" <td>...</td>\n",
" <td>0741</td>\n",
" <td>-9.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>US</td>\n",
" <td>840</td>\n",
" <td>N171US</td>\n",
" <td>SFO</td>\n",
" <td>CLT</td>\n",
" <td>0020</td>\n",
" <td>...</td>\n",
" <td>0811</td>\n",
" <td>5.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AA</td>\n",
" <td>258</td>\n",
" <td>N3HYAA</td>\n",
" <td>LAX</td>\n",
" <td>MIA</td>\n",
" <td>0020</td>\n",
" <td>...</td>\n",
" <td>0756</td>\n",
" <td>-9.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AS</td>\n",
" <td>135</td>\n",
" <td>N527AS</td>\n",
" <td>SEA</td>\n",
" <td>ANC</td>\n",
" <td>0025</td>\n",
" <td>...</td>\n",
" <td>0259</td>\n",
" <td>-21.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5819074</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>688</td>\n",
" <td>N657JB</td>\n",
" <td>LAX</td>\n",
" <td>BOS</td>\n",
" <td>2359</td>\n",
" <td>...</td>\n",
" <td>0753</td>\n",
" <td>-26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5819075</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>745</td>\n",
" <td>N828JB</td>\n",
" <td>JFK</td>\n",
" <td>PSE</td>\n",
" <td>2359</td>\n",
" <td>...</td>\n",
" <td>0430</td>\n",
" <td>-16.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5819076</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>1503</td>\n",
" <td>N913JB</td>\n",
" <td>JFK</td>\n",
" <td>SJU</td>\n",
" <td>2359</td>\n",
" <td>...</td>\n",
" <td>0432</td>\n",
" <td>-8.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5819077</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>333</td>\n",
" <td>N527JB</td>\n",
" <td>MCO</td>\n",
" <td>SJU</td>\n",
" <td>2359</td>\n",
" <td>...</td>\n",
" <td>0330</td>\n",
" <td>-10.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5819078</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>839</td>\n",
" <td>N534JB</td>\n",
" <td>JFK</td>\n",
" <td>BQN</td>\n",
" <td>2359</td>\n",
" <td>...</td>\n",
" <td>0442</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5819079 rows × 31 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-0c1ff911-5f60-4e29-9205-240e7e591a04')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-0c1ff911-5f60-4e29-9205-240e7e591a04 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-0c1ff911-5f60-4e29-9205-240e7e591a04');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-49139448-201a-4b8c-a141-08cb1702443a\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-49139448-201a-4b8c-a141-08cb1702443a')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-49139448-201a-4b8c-a141-08cb1702443a button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" <div id=\"id_6b469f81-fa5a-44e8-b14e-4eaca137da32\">\n",
" <style>\n",
" .colab-df-generate {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-generate:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
" <button class=\"colab-df-generate\" onclick=\"generateWithVariable('flights')\"\n",
" title=\"Generate code using this dataframe.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
" </svg>\n",
" </button>\n",
" <script>\n",
" (() => {\n",
" const buttonEl =\n",
" document.querySelector('#id_6b469f81-fa5a-44e8-b14e-4eaca137da32 button.colab-df-generate');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" buttonEl.onclick = () => {\n",
" google.colab.notebook.generateWithVariable('flights');\n",
" }\n",
" })();\n",
" </script>\n",
" </div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "flights"
}
},
"metadata": {},
"execution_count": 3
}
]
},
{
"cell_type": "markdown",
"source": [
"I noticed that some flights have an arrival time of 2400, which is not recognized as a valid time in Pandas. We need to convert those to 0000."
],
"metadata": {
"id": "ztDV-KytI6mS"
}
},
{
"cell_type": "code",
"source": [
"flights[flights['SCHEDULED_ARRIVAL'] == '2400']"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 161
},
"id": "-sSEbd2ourKk",
"outputId": "b08aa56d-47e6-48b9-9522-1a9eb6c13702"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER \\\n",
"1124523 2015 3 14 6 F9 2014 N923FR \n",
"1482599 2015 4 5 7 OO 6492 N765SK \n",
"\n",
" ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE ... \\\n",
"1124523 TPA PHL 2135 ... \n",
"1482599 BIL COD 2149 ... \n",
"\n",
" ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED CANCELLATION_REASON \\\n",
"1124523 0124 84.0 0 0 NaN \n",
"1482599 NaN NaN 1 0 NaN \n",
"\n",
" AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY LATE_AIRCRAFT_DELAY \\\n",
"1124523 0.0 0.0 0.0 84.0 \n",
"1482599 NaN NaN NaN NaN \n",
"\n",
" WEATHER_DELAY \n",
"1124523 0.0 \n",
"1482599 NaN \n",
"\n",
"[2 rows x 31 columns]"
],
"text/html": [
"\n",
" <div id=\"df-c3093baa-c266-40be-8cbd-c158a548a441\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>YEAR</th>\n",
" <th>MONTH</th>\n",
" <th>DAY</th>\n",
" <th>DAY_OF_WEEK</th>\n",
" <th>AIRLINE</th>\n",
" <th>FLIGHT_NUMBER</th>\n",
" <th>TAIL_NUMBER</th>\n",
" <th>ORIGIN_AIRPORT</th>\n",
" <th>DESTINATION_AIRPORT</th>\n",
" <th>SCHEDULED_DEPARTURE</th>\n",
" <th>...</th>\n",
" <th>ARRIVAL_TIME</th>\n",
" <th>ARRIVAL_DELAY</th>\n",
" <th>DIVERTED</th>\n",
" <th>CANCELLED</th>\n",
" <th>CANCELLATION_REASON</th>\n",
" <th>AIR_SYSTEM_DELAY</th>\n",
" <th>SECURITY_DELAY</th>\n",
" <th>AIRLINE_DELAY</th>\n",
" <th>LATE_AIRCRAFT_DELAY</th>\n",
" <th>WEATHER_DELAY</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1124523</th>\n",
" <td>2015</td>\n",
" <td>3</td>\n",
" <td>14</td>\n",
" <td>6</td>\n",
" <td>F9</td>\n",
" <td>2014</td>\n",
" <td>N923FR</td>\n",
" <td>TPA</td>\n",
" <td>PHL</td>\n",
" <td>2135</td>\n",
" <td>...</td>\n",
" <td>0124</td>\n",
" <td>84.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>84.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1482599</th>\n",
" <td>2015</td>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" <td>OO</td>\n",
" <td>6492</td>\n",
" <td>N765SK</td>\n",
" <td>BIL</td>\n",
" <td>COD</td>\n",
" <td>2149</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2 rows × 31 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-c3093baa-c266-40be-8cbd-c158a548a441')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-c3093baa-c266-40be-8cbd-c158a548a441 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-c3093baa-c266-40be-8cbd-c158a548a441');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-2b96a02a-1b85-4fc0-8674-3afc6065183d\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-2b96a02a-1b85-4fc0-8674-3afc6065183d')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-2b96a02a-1b85-4fc0-8674-3afc6065183d button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe"
}
},
"metadata": {},
"execution_count": 4
}
]
},
{
"cell_type": "code",
"source": [
"flights['SCHEDULED_ARRIVAL'] = flights['SCHEDULED_ARRIVAL'].replace('2400', '0000')"
],
"metadata": {
"id": "0mvmVKTdv1yF"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Later on, we want to add some weather observation columns at the scheduled departured and scheduled arrival time for each flight. In order perform the join, we need the depature of arrival time columns in datetime format."
],
"metadata": {
"id": "-iVzXpl4yJsx"
}
},
{
"cell_type": "code",
"source": [
"# Convert the scheduled departure and arrival times to datetime columns\n",
"flights['SCHEDULED_DEPARTURE'] = pd.to_datetime(flights[['YEAR', 'MONTH', 'DAY']].astype(str).agg('-'.join, axis=1) + ' ' + flights['SCHEDULED_DEPARTURE'])\n",
"flights['SCHEDULED_ARRIVAL'] = pd.to_datetime(flights[['YEAR', 'MONTH', 'DAY']].astype(str).agg('-'.join, axis=1) + ' ' + flights['SCHEDULED_ARRIVAL'])"
],
"metadata": {
"id": "ptocNpVosVar"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# There is a single year, month, and day column, but what if a flight takes off and then lands after midnight?\n",
"# In that case, we need to add one day to the scheudled arrival time\n",
"flights.loc[flights['SCHEDULED_ARRIVAL'] < flights['SCHEDULED_DEPARTURE'], 'SCHEDULED_ARRIVAL'] += pd.Timedelta(days=1)"
],
"metadata": {
"id": "KRIif_nmyIkf"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Most flights in the dataset have their origin and destination airports listed with the airport's three-letter identifier (e.g. JFK). However, some flights are listed with a five-digit code (e.g. 14747) and I can't find a table that will convert those five-digit codes to three-letter identifiers. Therefore, we'll drop flights that use five-digit codes for the airports."
],
"metadata": {
"id": "O4uxSLGnzJ7x"
}
},
{
"cell_type": "code",
"source": [
"flights = flights[\n",
" flights['ORIGIN_AIRPORT'].astype(str).str.fullmatch(r'[A-Za-z]{3}') &\n",
" flights['DESTINATION_AIRPORT'].astype(str).str.fullmatch(r'[A-Za-z]{3}')\n",
"]"
],
"metadata": {
"id": "Ze2StFQ89Ezk"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We also need to consider that the departure and arrival times in the dataset are in local time, meaning all times are in the airport's timezone. We need to make our datetime objects aware of the timezone they live in.\n",
"\n",
"I'm importing a new DataFrame that includes the timezone for each airport, and we'll join it with the flights DataFrame in order to attach timezones to our departure and arrival times."
],
"metadata": {
"id": "u_k0tevlzoER"
}
},
{
"cell_type": "code",
"source": [
"airport_tz_path = '/content/drive/MyDrive/CU Boulder Data Science/DTSA 5506 - Data Mining Project/data/airport_timezones.csv'\n",
"airport_timezones = pd.read_csv(airport_tz_path)"
],
"metadata": {
"id": "HgorMpBr1QYE"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"airport_timezones.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 293
},
"id": "_V_aKgsH3VuM",
"outputId": "7fd7f885-fc88-469c-f70e-7559f6fd4ff1"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" code icao name latitude longitude \\\n",
"0 AAA NTGA Anaa -17.350665 -145.511120 \n",
"1 AAB YARY Arrabury Airport -26.696783 141.049092 \n",
"2 AAC HEAR El Arish International Airport 31.074284 33.829172 \n",
"3 AAD HCAD Adado Airport 6.096286 46.637708 \n",
"4 AAE DABB Les Salines Airport 36.821392 7.811857 \n",
"\n",
" elevation url time_zone city_code country city \\\n",
"0 36 NaN Pacific/Tahiti AAA PF NaN \n",
"1 328 NaN Australia/Brisbane AAB AU Tanbar \n",
"2 85 NaN Africa/Cairo AAC EG Arish \n",
"3 980 NaN Africa/Khartoum AAD SO Adado \n",
"4 36 NaN Africa/Algiers AAE DZ El Hadjar \n",
"\n",
" state county type \n",
"0 NaN NaN AP \n",
"1 Queensland Barcoo Shire AP \n",
"2 Muhafazat Shamal Sina' NaN AP \n",
"3 NaN NaN AP \n",
"4 Annaba NaN AP "
],
"text/html": [
"\n",
" <div id=\"df-43f9c597-9e5d-4fce-8ac0-8225a34f3235\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>code</th>\n",
" <th>icao</th>\n",
" <th>name</th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" <th>elevation</th>\n",
" <th>url</th>\n",
" <th>time_zone</th>\n",
" <th>city_code</th>\n",
" <th>country</th>\n",
" <th>city</th>\n",
" <th>state</th>\n",
" <th>county</th>\n",
" <th>type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>AAA</td>\n",
" <td>NTGA</td>\n",
" <td>Anaa</td>\n",
" <td>-17.350665</td>\n",
" <td>-145.511120</td>\n",
" <td>36</td>\n",
" <td>NaN</td>\n",
" <td>Pacific/Tahiti</td>\n",
" <td>AAA</td>\n",
" <td>PF</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>AAB</td>\n",
" <td>YARY</td>\n",
" <td>Arrabury Airport</td>\n",
" <td>-26.696783</td>\n",
" <td>141.049092</td>\n",
" <td>328</td>\n",
" <td>NaN</td>\n",
" <td>Australia/Brisbane</td>\n",
" <td>AAB</td>\n",
" <td>AU</td>\n",
" <td>Tanbar</td>\n",
" <td>Queensland</td>\n",
" <td>Barcoo Shire</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>AAC</td>\n",
" <td>HEAR</td>\n",
" <td>El Arish International Airport</td>\n",
" <td>31.074284</td>\n",
" <td>33.829172</td>\n",
" <td>85</td>\n",
" <td>NaN</td>\n",
" <td>Africa/Cairo</td>\n",
" <td>AAC</td>\n",
" <td>EG</td>\n",
" <td>Arish</td>\n",
" <td>Muhafazat Shamal Sina'</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>AAD</td>\n",
" <td>HCAD</td>\n",
" <td>Adado Airport</td>\n",
" <td>6.096286</td>\n",
" <td>46.637708</td>\n",
" <td>980</td>\n",
" <td>NaN</td>\n",
" <td>Africa/Khartoum</td>\n",
" <td>AAD</td>\n",
" <td>SO</td>\n",
" <td>Adado</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>AAE</td>\n",
" <td>DABB</td>\n",
" <td>Les Salines Airport</td>\n",
" <td>36.821392</td>\n",
" <td>7.811857</td>\n",
" <td>36</td>\n",
" <td>NaN</td>\n",
" <td>Africa/Algiers</td>\n",
" <td>AAE</td>\n",
" <td>DZ</td>\n",
" <td>El Hadjar</td>\n",
" <td>Annaba</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-43f9c597-9e5d-4fce-8ac0-8225a34f3235')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-43f9c597-9e5d-4fce-8ac0-8225a34f3235 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-43f9c597-9e5d-4fce-8ac0-8225a34f3235');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-f5ef216a-6e36-4064-8d37-0248b89283d5\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-f5ef216a-6e36-4064-8d37-0248b89283d5')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-f5ef216a-6e36-4064-8d37-0248b89283d5 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "airport_timezones",
"summary": "{\n \"name\": \"airport_timezones\",\n \"rows\": 9803,\n \"fields\": [\n {\n \"column\": \"code\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 9803,\n \"samples\": [\n \"HHH\",\n \"EFD\",\n \"WKI\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"icao\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 8346,\n \"samples\": [\n \"VRBK\",\n \"KGPT\",\n \"NTGO\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 9466,\n \"samples\": [\n \"Port Williams SPB\",\n \"Williamson County Airport\",\n \"Mount Buffalo\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"latitude\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 28.572950224219454,\n \"min\": -79.777778,\n \"max\": 83.382225,\n \"num_unique_values\": 8821,\n \"samples\": [\n 41.5086358,\n 37.42361,\n 64.66665714999999\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"longitude\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 96.26983624598257,\n \"min\": -179.34202519218303,\n \"max\": 179.9493282,\n \"num_unique_values\": 9284,\n \"samples\": [\n -97.31836461280574,\n -79.21667,\n 67.81759460556913\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"elevation\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1756,\n \"min\": -1299,\n \"max\": 17011,\n \"num_unique_values\": 2353,\n \"samples\": [\n 5216,\n 1234,\n 3120\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"url\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1703,\n \"samples\": [\n \"http://www.thelandingsgrandlake.com/airport.htm\",\n \"https://avinor.no/en/airport/kirkenes-airport/\",\n \"https://www.flylax.com/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"time_zone\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 389,\n \"samples\": [\n \"Asia/Yerevan\",\n \"America/Matamoros\",\n \"Asia/Kathmandu\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"city_code\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 9250,\n \"samples\": [\n \"BBS\",\n \"FRZ\",\n \"SOK\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"country\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 237,\n \"samples\": [\n \"TL\",\n \"RU\",\n \"LS\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"city\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6846,\n \"samples\": [\n \"Zaporizhzhya\",\n \"Semarang\",\n \"Hanapepe Heights\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"state\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2064,\n \"samples\": [\n \"Malaita\",\n \"Brjansk\",\n \"Qazvin\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"county\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2702,\n \"samples\": [\n \"Valley County\",\n \"Manjimup\",\n \"Mason County\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"AP\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 10
}
]
},
{
"cell_type": "markdown",
"source": [
"We need to join the timezone table twice. Once for the departure airports/times and once for the arrival airports/times."
],
"metadata": {
"id": "dHXrza0m0HVJ"
}
},
{
"cell_type": "code",
"source": [
"# Adding a prefix to all the column names from each table will help us differentiate between the columns after the two joins\n",
"origin_airport_timezones = airport_timezones.add_prefix('origin_')\n",
"destination_airport_timezones = airport_timezones.add_prefix('destination_')"
],
"metadata": {
"id": "RTiZ8X6X4uV2"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"flights = pd.merge(flights,\n",
" origin_airport_timezones,\n",
" left_on='ORIGIN_AIRPORT',\n",
" right_on='origin_code',\n",
" how='left')\n",
"flights = pd.merge(flights,\n",
" destination_airport_timezones,\n",
" left_on='DESTINATION_AIRPORT',\n",
" right_on='destination_code',\n",
" how='left')"
],
"metadata": {
"id": "FWjrtbqA3pLr"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# A preview of the dataset after both joins\n",
"flights"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 444
},
"id": "M1CV3w0o6JFI",
"outputId": "8256d53f-80f7-4463-96e0-c6c93a340039"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER \\\n",
"0 2015 1 1 4 AS 98 N407AS \n",
"1 2015 1 1 4 AA 2336 N3KUAA \n",
"2 2015 1 1 4 US 840 N171US \n",
"3 2015 1 1 4 AA 258 N3HYAA \n",
"4 2015 1 1 4 AS 135 N527AS \n",
"... ... ... ... ... ... ... ... \n",
"5332909 2015 12 31 4 B6 688 N657JB \n",
"5332910 2015 12 31 4 B6 745 N828JB \n",
"5332911 2015 12 31 4 B6 1503 N913JB \n",
"5332912 2015 12 31 4 B6 333 N527JB \n",
"5332913 2015 12 31 4 B6 839 N534JB \n",
"\n",
" ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE ... \\\n",
"0 ANC SEA 2015-01-01 00:05:00 ... \n",
"1 LAX PBI 2015-01-01 00:10:00 ... \n",
"2 SFO CLT 2015-01-01 00:20:00 ... \n",
"3 LAX MIA 2015-01-01 00:20:00 ... \n",
"4 SEA ANC 2015-01-01 00:25:00 ... \n",
"... ... ... ... ... \n",
"5332909 LAX BOS 2015-12-31 23:59:00 ... \n",
"5332910 JFK PSE 2015-12-31 23:59:00 ... \n",
"5332911 JFK SJU 2015-12-31 23:59:00 ... \n",
"5332912 MCO SJU 2015-12-31 23:59:00 ... \n",
"5332913 JFK BQN 2015-12-31 23:59:00 ... \n",
"\n",
" destination_longitude destination_elevation \\\n",
"0 -122.308016 206 \n",
"1 -80.092487 32 \n",
"2 -80.945744 685 \n",
"3 -80.286723 45 \n",
"4 -149.997892 62 \n",
"... ... ... \n",
"5332909 -71.013507 29 \n",
"5332910 -66.563350 36 \n",
"5332911 -65.997430 36 \n",
"5332912 -65.997430 36 \n",
"5332913 -67.134216 223 \n",
"\n",
" destination_url destination_time_zone \\\n",
"0 http://www.portseattle.org/seatac/ America/Los_Angeles \n",
"1 http://www.pbia.org America/New_York \n",
"2 http://www.charlotteairport.com/ America/New_York \n",
"3 http://www.miami-airport.com/ America/New_York \n",
"4 NaN America/Anchorage \n",
"... ... ... \n",
"5332909 http://www.massport.com/logan/ America/New_York \n",
"5332910 NaN America/Puerto_Rico \n",
"5332911 NaN America/Puerto_Rico \n",
"5332912 NaN America/Puerto_Rico \n",
"5332913 NaN America/Puerto_Rico \n",
"\n",
" destination_city_code destination_country destination_city \\\n",
"0 SEA US SeaTac \n",
"1 PBI US Haverhill \n",
"2 CLT US NaN \n",
"3 MIA US Virginia Gardens \n",
"4 ANC US NaN \n",
"... ... ... ... \n",
"5332909 BOS US Chelsea \n",
"5332910 PSE PR Aguilita \n",
"5332911 SJU PR Carolina \n",
"5332912 SJU PR Carolina \n",
"5332913 BQN PR San Antonio \n",
"\n",
" destination_state destination_county destination_type \n",
"0 Washington King County AP \n",
"1 Florida Palm Beach County AP \n",
"2 NaN NaN AP \n",
"3 Florida Miami-Dade County AP \n",
"4 NaN NaN AP \n",
"... ... ... ... \n",
"5332909 Massachusetts Suffolk County AP \n",
"5332910 Juana Diaz NaN AP \n",
"5332911 Carolina NaN AP \n",
"5332912 Carolina NaN AP \n",
"5332913 Aguadilla NaN AP \n",
"\n",
"[5332914 rows x 59 columns]"
],
"text/html": [
"\n",
" <div id=\"df-b9805236-6f8d-4b5b-8983-f5f2a5b78646\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>YEAR</th>\n",
" <th>MONTH</th>\n",
" <th>DAY</th>\n",
" <th>DAY_OF_WEEK</th>\n",
" <th>AIRLINE</th>\n",
" <th>FLIGHT_NUMBER</th>\n",
" <th>TAIL_NUMBER</th>\n",
" <th>ORIGIN_AIRPORT</th>\n",
" <th>DESTINATION_AIRPORT</th>\n",
" <th>SCHEDULED_DEPARTURE</th>\n",
" <th>...</th>\n",
" <th>destination_longitude</th>\n",
" <th>destination_elevation</th>\n",
" <th>destination_url</th>\n",
" <th>destination_time_zone</th>\n",
" <th>destination_city_code</th>\n",
" <th>destination_country</th>\n",
" <th>destination_city</th>\n",
" <th>destination_state</th>\n",
" <th>destination_county</th>\n",
" <th>destination_type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AS</td>\n",
" <td>98</td>\n",
" <td>N407AS</td>\n",
" <td>ANC</td>\n",
" <td>SEA</td>\n",
" <td>2015-01-01 00:05:00</td>\n",
" <td>...</td>\n",
" <td>-122.308016</td>\n",
" <td>206</td>\n",
" <td>http://www.portseattle.org/seatac/</td>\n",
" <td>America/Los_Angeles</td>\n",
" <td>SEA</td>\n",
" <td>US</td>\n",
" <td>SeaTac</td>\n",
" <td>Washington</td>\n",
" <td>King County</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AA</td>\n",
" <td>2336</td>\n",
" <td>N3KUAA</td>\n",
" <td>LAX</td>\n",
" <td>PBI</td>\n",
" <td>2015-01-01 00:10:00</td>\n",
" <td>...</td>\n",
" <td>-80.092487</td>\n",
" <td>32</td>\n",
" <td>http://www.pbia.org</td>\n",
" <td>America/New_York</td>\n",
" <td>PBI</td>\n",
" <td>US</td>\n",
" <td>Haverhill</td>\n",
" <td>Florida</td>\n",
" <td>Palm Beach County</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>US</td>\n",
" <td>840</td>\n",
" <td>N171US</td>\n",
" <td>SFO</td>\n",
" <td>CLT</td>\n",
" <td>2015-01-01 00:20:00</td>\n",
" <td>...</td>\n",
" <td>-80.945744</td>\n",
" <td>685</td>\n",
" <td>http://www.charlotteairport.com/</td>\n",
" <td>America/New_York</td>\n",
" <td>CLT</td>\n",
" <td>US</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AA</td>\n",
" <td>258</td>\n",
" <td>N3HYAA</td>\n",
" <td>LAX</td>\n",
" <td>MIA</td>\n",
" <td>2015-01-01 00:20:00</td>\n",
" <td>...</td>\n",
" <td>-80.286723</td>\n",
" <td>45</td>\n",
" <td>http://www.miami-airport.com/</td>\n",
" <td>America/New_York</td>\n",
" <td>MIA</td>\n",
" <td>US</td>\n",
" <td>Virginia Gardens</td>\n",
" <td>Florida</td>\n",
" <td>Miami-Dade County</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AS</td>\n",
" <td>135</td>\n",
" <td>N527AS</td>\n",
" <td>SEA</td>\n",
" <td>ANC</td>\n",
" <td>2015-01-01 00:25:00</td>\n",
" <td>...</td>\n",
" <td>-149.997892</td>\n",
" <td>62</td>\n",
" <td>NaN</td>\n",
" <td>America/Anchorage</td>\n",
" <td>ANC</td>\n",
" <td>US</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332909</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>688</td>\n",
" <td>N657JB</td>\n",
" <td>LAX</td>\n",
" <td>BOS</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>-71.013507</td>\n",
" <td>29</td>\n",
" <td>http://www.massport.com/logan/</td>\n",
" <td>America/New_York</td>\n",
" <td>BOS</td>\n",
" <td>US</td>\n",
" <td>Chelsea</td>\n",
" <td>Massachusetts</td>\n",
" <td>Suffolk County</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332910</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>745</td>\n",
" <td>N828JB</td>\n",
" <td>JFK</td>\n",
" <td>PSE</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>-66.563350</td>\n",
" <td>36</td>\n",
" <td>NaN</td>\n",
" <td>America/Puerto_Rico</td>\n",
" <td>PSE</td>\n",
" <td>PR</td>\n",
" <td>Aguilita</td>\n",
" <td>Juana Diaz</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332911</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>1503</td>\n",
" <td>N913JB</td>\n",
" <td>JFK</td>\n",
" <td>SJU</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>-65.997430</td>\n",
" <td>36</td>\n",
" <td>NaN</td>\n",
" <td>America/Puerto_Rico</td>\n",
" <td>SJU</td>\n",
" <td>PR</td>\n",
" <td>Carolina</td>\n",
" <td>Carolina</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332912</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>333</td>\n",
" <td>N527JB</td>\n",
" <td>MCO</td>\n",
" <td>SJU</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>-65.997430</td>\n",
" <td>36</td>\n",
" <td>NaN</td>\n",
" <td>America/Puerto_Rico</td>\n",
" <td>SJU</td>\n",
" <td>PR</td>\n",
" <td>Carolina</td>\n",
" <td>Carolina</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332913</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>839</td>\n",
" <td>N534JB</td>\n",
" <td>JFK</td>\n",
" <td>BQN</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>-67.134216</td>\n",
" <td>223</td>\n",
" <td>NaN</td>\n",
" <td>America/Puerto_Rico</td>\n",
" <td>BQN</td>\n",
" <td>PR</td>\n",
" <td>San Antonio</td>\n",
" <td>Aguadilla</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5332914 rows × 59 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b9805236-6f8d-4b5b-8983-f5f2a5b78646')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-b9805236-6f8d-4b5b-8983-f5f2a5b78646 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-b9805236-6f8d-4b5b-8983-f5f2a5b78646');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-98062ffd-15b2-40a6-8078-a0a09394d30d\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-98062ffd-15b2-40a6-8078-a0a09394d30d')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-98062ffd-15b2-40a6-8078-a0a09394d30d button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" <div id=\"id_16cce5dc-375a-40c2-bbd5-563ad0bc451d\">\n",
" <style>\n",
" .colab-df-generate {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-generate:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
" <button class=\"colab-df-generate\" onclick=\"generateWithVariable('flights')\"\n",
" title=\"Generate code using this dataframe.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
" </svg>\n",
" </button>\n",
" <script>\n",
" (() => {\n",
" const buttonEl =\n",
" document.querySelector('#id_16cce5dc-375a-40c2-bbd5-563ad0bc451d button.colab-df-generate');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" buttonEl.onclick = () => {\n",
" google.colab.notebook.generateWithVariable('flights');\n",
" }\n",
" })();\n",
" </script>\n",
" </div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "flights"
}
},
"metadata": {},
"execution_count": 13
}
]
},
{
"cell_type": "markdown",
"source": [
"Now that we've joined the timezone information, we need to attach the timezone to the datetime objects in the departure and arrival time columns.\n",
"\n",
"This is a memory-intensive task, so we perform the task in batches for each timezone. We do this for both the departure and arrival time columns. Print statements within the loop show us how things are progressing while the code runs.\n",
"\n",
"In some cases, making a datetime object timezone-aware is not possible, specifically for times between 1am and 2am Nov 1, 2015. This is the \"fall back\" hour of Daylight Savings Time. Because we fall back, that hour \"happens\" twice, and formally attaching a timezone creates ambiguity. We'll drop the few rows where that is a problem."
],
"metadata": {
"id": "OtEGDF0H0q0h"
}
},
{
"cell_type": "code",
"source": [
"flights['SCHEDULED_DEPARTURE_tz_aware'] = pd.Series([pd.NaT] * len(flights), dtype='object')\n",
"\n",
"# Group by timezone and localize in batches\n",
"for tz, group_idx in flights.groupby('origin_time_zone').groups.items():\n",
" print(f'Assigning timezones to flight times in {tz} timezone...')\n",
" try:\n",
" flights.loc[group_idx, 'SCHEDULED_DEPARTURE_tz_aware'] = flights.loc[group_idx, 'SCHEDULED_DEPARTURE'].dt.tz_localize(ZoneInfo(tz), ambiguous='infer')\n",
" except pytz.AmbiguousTimeError:\n",
" flights.loc[group_idx, 'SCHEDULED_DEPARTURE_tz_aware'] = flights.loc[group_idx, 'SCHEDULED_DEPARTURE'].dt.tz_localize(ZoneInfo(tz), ambiguous='NaT')\n",
" print(f\"AmbiguousTimeError encountered for timezone: {tz}. Using 'NaT' instead.\")\n",
" print(f'{tz} Complete.')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "Ctuouv5jFW86",
"outputId": "5e22c2fe-279f-47b6-f603-4ed40d60319d"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Assigning timezones to flight times in America/Adak timezone...\n",
"America/Adak Complete.\n",
"Assigning timezones to flight times in America/Anchorage timezone...\n",
"AmbiguousTimeError encountered for timezone: America/Anchorage. Using 'NaT' instead.\n",
"America/Anchorage Complete.\n",
"Assigning timezones to flight times in America/Boise timezone...\n",
"America/Boise Complete.\n",
"Assigning timezones to flight times in America/Chicago timezone...\n",
"America/Chicago Complete.\n",
"Assigning timezones to flight times in America/Denver timezone...\n",
"America/Denver Complete.\n",
"Assigning timezones to flight times in America/Detroit timezone...\n",
"America/Detroit Complete.\n",
"Assigning timezones to flight times in America/Indiana/Indianapolis timezone...\n",
"America/Indiana/Indianapolis Complete.\n",
"Assigning timezones to flight times in America/Juneau timezone...\n",
"America/Juneau Complete.\n",
"Assigning timezones to flight times in America/Kentucky/Louisville timezone...\n",
"America/Kentucky/Louisville Complete.\n",
"Assigning timezones to flight times in America/Los_Angeles timezone...\n",
"America/Los_Angeles Complete.\n",
"Assigning timezones to flight times in America/Menominee timezone...\n",
"America/Menominee Complete.\n",
"Assigning timezones to flight times in America/New_York timezone...\n",
"America/New_York Complete.\n",
"Assigning timezones to flight times in America/Nome timezone...\n",
"America/Nome Complete.\n",
"Assigning timezones to flight times in America/Phoenix timezone...\n",
"America/Phoenix Complete.\n",
"Assigning timezones to flight times in America/Puerto_Rico timezone...\n",
"America/Puerto_Rico Complete.\n",
"Assigning timezones to flight times in America/Sitka timezone...\n",
"America/Sitka Complete.\n",
"Assigning timezones to flight times in America/St_Thomas timezone...\n",
"America/St_Thomas Complete.\n",
"Assigning timezones to flight times in America/Yakutat timezone...\n",
"America/Yakutat Complete.\n",
"Assigning timezones to flight times in Pacific/Guam timezone...\n",
"Pacific/Guam Complete.\n",
"Assigning timezones to flight times in Pacific/Honolulu timezone...\n",
"Pacific/Honolulu Complete.\n",
"Assigning timezones to flight times in Pacific/Pago_Pago timezone...\n",
"Pacific/Pago_Pago Complete.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"flights['SCHEDULED_ARRIVAL_tz_aware'] = pd.Series([pd.NaT] * len(flights), dtype='object')\n",
"\n",
"# Group by timezone and localize in batches\n",
"for tz, group_idx in flights.groupby('destination_time_zone').groups.items():\n",
" print(f'Assigning timezones to flight times in {tz} timezone...')\n",
" try:\n",
" flights.loc[group_idx, 'SCHEDULED_ARRIVAL_tz_aware'] = flights.loc[group_idx, 'SCHEDULED_ARRIVAL'].dt.tz_localize(ZoneInfo(tz), ambiguous='infer')\n",
" except pytz.AmbiguousTimeError:\n",
" flights.loc[group_idx, 'SCHEDULED_ARRIVAL_tz_aware'] = flights.loc[group_idx, 'SCHEDULED_ARRIVAL'].dt.tz_localize(ZoneInfo(tz), ambiguous='NaT')\n",
" print(f\"AmbiguousTimeError encountered for timezone: {tz}. Using 'NaT' instead.\")\n",
" print(f'{tz} Complete.')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "noJ1BL89KSJg",
"outputId": "261d916a-9de3-4d85-9896-a8f77425c065",
"collapsed": true
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Assigning timezones to flight times in America/Adak timezone...\n",
"America/Adak Complete.\n",
"Assigning timezones to flight times in America/Anchorage timezone...\n",
"America/Anchorage Complete.\n",
"Assigning timezones to flight times in America/Boise timezone...\n",
"America/Boise Complete.\n",
"Assigning timezones to flight times in America/Chicago timezone...\n",
"America/Chicago Complete.\n",
"Assigning timezones to flight times in America/Denver timezone...\n",
"America/Denver Complete.\n",
"Assigning timezones to flight times in America/Detroit timezone...\n",
"America/Detroit Complete.\n",
"Assigning timezones to flight times in America/Indiana/Indianapolis timezone...\n",
"America/Indiana/Indianapolis Complete.\n",
"Assigning timezones to flight times in America/Juneau timezone...\n",
"America/Juneau Complete.\n",
"Assigning timezones to flight times in America/Kentucky/Louisville timezone...\n",
"America/Kentucky/Louisville Complete.\n",
"Assigning timezones to flight times in America/Los_Angeles timezone...\n",
"America/Los_Angeles Complete.\n",
"Assigning timezones to flight times in America/Menominee timezone...\n",
"America/Menominee Complete.\n",
"Assigning timezones to flight times in America/New_York timezone...\n",
"America/New_York Complete.\n",
"Assigning timezones to flight times in America/Nome timezone...\n",
"America/Nome Complete.\n",
"Assigning timezones to flight times in America/Phoenix timezone...\n",
"America/Phoenix Complete.\n",
"Assigning timezones to flight times in America/Puerto_Rico timezone...\n",
"America/Puerto_Rico Complete.\n",
"Assigning timezones to flight times in America/Sitka timezone...\n",
"America/Sitka Complete.\n",
"Assigning timezones to flight times in America/St_Thomas timezone...\n",
"America/St_Thomas Complete.\n",
"Assigning timezones to flight times in America/Yakutat timezone...\n",
"America/Yakutat Complete.\n",
"Assigning timezones to flight times in Pacific/Guam timezone...\n",
"Pacific/Guam Complete.\n",
"Assigning timezones to flight times in Pacific/Honolulu timezone...\n",
"Pacific/Honolulu Complete.\n",
"Assigning timezones to flight times in Pacific/Pago_Pago timezone...\n",
"Pacific/Pago_Pago Complete.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# A small number of flights have times that could not be made timezone aware because the arrival or departure time occured during the \"fall back\" of daylight savings time\n",
"flights = flights.dropna(subset=['SCHEDULED_DEPARTURE_tz_aware', 'SCHEDULED_ARRIVAL_tz_aware'])"
],
"metadata": {
"id": "JcHVlLAq_vV5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"flights"
],
"metadata": {
"id": "FkK1M6NsLAzf",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 444
},
"outputId": "97f64305-51e2-41e1-bc50-d150d8aa4cc5"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER \\\n",
"0 2015 1 1 4 AS 98 N407AS \n",
"1 2015 1 1 4 AA 2336 N3KUAA \n",
"2 2015 1 1 4 US 840 N171US \n",
"3 2015 1 1 4 AA 258 N3HYAA \n",
"4 2015 1 1 4 AS 135 N527AS \n",
"... ... ... ... ... ... ... ... \n",
"5332909 2015 12 31 4 B6 688 N657JB \n",
"5332910 2015 12 31 4 B6 745 N828JB \n",
"5332911 2015 12 31 4 B6 1503 N913JB \n",
"5332912 2015 12 31 4 B6 333 N527JB \n",
"5332913 2015 12 31 4 B6 839 N534JB \n",
"\n",
" ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE ... \\\n",
"0 ANC SEA 2015-01-01 00:05:00 ... \n",
"1 LAX PBI 2015-01-01 00:10:00 ... \n",
"2 SFO CLT 2015-01-01 00:20:00 ... \n",
"3 LAX MIA 2015-01-01 00:20:00 ... \n",
"4 SEA ANC 2015-01-01 00:25:00 ... \n",
"... ... ... ... ... \n",
"5332909 LAX BOS 2015-12-31 23:59:00 ... \n",
"5332910 JFK PSE 2015-12-31 23:59:00 ... \n",
"5332911 JFK SJU 2015-12-31 23:59:00 ... \n",
"5332912 MCO SJU 2015-12-31 23:59:00 ... \n",
"5332913 JFK BQN 2015-12-31 23:59:00 ... \n",
"\n",
" destination_url destination_time_zone \\\n",
"0 http://www.portseattle.org/seatac/ America/Los_Angeles \n",
"1 http://www.pbia.org America/New_York \n",
"2 http://www.charlotteairport.com/ America/New_York \n",
"3 http://www.miami-airport.com/ America/New_York \n",
"4 NaN America/Anchorage \n",
"... ... ... \n",
"5332909 http://www.massport.com/logan/ America/New_York \n",
"5332910 NaN America/Puerto_Rico \n",
"5332911 NaN America/Puerto_Rico \n",
"5332912 NaN America/Puerto_Rico \n",
"5332913 NaN America/Puerto_Rico \n",
"\n",
" destination_city_code destination_country destination_city \\\n",
"0 SEA US SeaTac \n",
"1 PBI US Haverhill \n",
"2 CLT US NaN \n",
"3 MIA US Virginia Gardens \n",
"4 ANC US NaN \n",
"... ... ... ... \n",
"5332909 BOS US Chelsea \n",
"5332910 PSE PR Aguilita \n",
"5332911 SJU PR Carolina \n",
"5332912 SJU PR Carolina \n",
"5332913 BQN PR San Antonio \n",
"\n",
" destination_state destination_county destination_type \\\n",
"0 Washington King County AP \n",
"1 Florida Palm Beach County AP \n",
"2 NaN NaN AP \n",
"3 Florida Miami-Dade County AP \n",
"4 NaN NaN AP \n",
"... ... ... ... \n",
"5332909 Massachusetts Suffolk County AP \n",
"5332910 Juana Diaz NaN AP \n",
"5332911 Carolina NaN AP \n",
"5332912 Carolina NaN AP \n",
"5332913 Aguadilla NaN AP \n",
"\n",
" SCHEDULED_DEPARTURE_tz_aware SCHEDULED_ARRIVAL_tz_aware \n",
"0 2015-01-01 00:05:00-09:00 2015-01-01 04:30:00-08:00 \n",
"1 2015-01-01 00:10:00-08:00 2015-01-01 07:50:00-05:00 \n",
"2 2015-01-01 00:20:00-08:00 2015-01-01 08:06:00-05:00 \n",
"3 2015-01-01 00:20:00-08:00 2015-01-01 08:05:00-05:00 \n",
"4 2015-01-01 00:25:00-08:00 2015-01-01 03:20:00-09:00 \n",
"... ... ... \n",
"5332909 2015-12-31 23:59:00-08:00 2016-01-01 08:19:00-05:00 \n",
"5332910 2015-12-31 23:59:00-05:00 2016-01-01 04:46:00-04:00 \n",
"5332911 2015-12-31 23:59:00-05:00 2016-01-01 04:40:00-04:00 \n",
"5332912 2015-12-31 23:59:00-05:00 2016-01-01 03:40:00-04:00 \n",
"5332913 2015-12-31 23:59:00-05:00 2016-01-01 04:40:00-04:00 \n",
"\n",
"[5332911 rows x 61 columns]"
],
"text/html": [
"\n",
" <div id=\"df-9de7a4a3-81d7-4bbe-85c9-399367b26011\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>YEAR</th>\n",
" <th>MONTH</th>\n",
" <th>DAY</th>\n",
" <th>DAY_OF_WEEK</th>\n",
" <th>AIRLINE</th>\n",
" <th>FLIGHT_NUMBER</th>\n",
" <th>TAIL_NUMBER</th>\n",
" <th>ORIGIN_AIRPORT</th>\n",
" <th>DESTINATION_AIRPORT</th>\n",
" <th>SCHEDULED_DEPARTURE</th>\n",
" <th>...</th>\n",
" <th>destination_url</th>\n",
" <th>destination_time_zone</th>\n",
" <th>destination_city_code</th>\n",
" <th>destination_country</th>\n",
" <th>destination_city</th>\n",
" <th>destination_state</th>\n",
" <th>destination_county</th>\n",
" <th>destination_type</th>\n",
" <th>SCHEDULED_DEPARTURE_tz_aware</th>\n",
" <th>SCHEDULED_ARRIVAL_tz_aware</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AS</td>\n",
" <td>98</td>\n",
" <td>N407AS</td>\n",
" <td>ANC</td>\n",
" <td>SEA</td>\n",
" <td>2015-01-01 00:05:00</td>\n",
" <td>...</td>\n",
" <td>http://www.portseattle.org/seatac/</td>\n",
" <td>America/Los_Angeles</td>\n",
" <td>SEA</td>\n",
" <td>US</td>\n",
" <td>SeaTac</td>\n",
" <td>Washington</td>\n",
" <td>King County</td>\n",
" <td>AP</td>\n",
" <td>2015-01-01 00:05:00-09:00</td>\n",
" <td>2015-01-01 04:30:00-08:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AA</td>\n",
" <td>2336</td>\n",
" <td>N3KUAA</td>\n",
" <td>LAX</td>\n",
" <td>PBI</td>\n",
" <td>2015-01-01 00:10:00</td>\n",
" <td>...</td>\n",
" <td>http://www.pbia.org</td>\n",
" <td>America/New_York</td>\n",
" <td>PBI</td>\n",
" <td>US</td>\n",
" <td>Haverhill</td>\n",
" <td>Florida</td>\n",
" <td>Palm Beach County</td>\n",
" <td>AP</td>\n",
" <td>2015-01-01 00:10:00-08:00</td>\n",
" <td>2015-01-01 07:50:00-05:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>US</td>\n",
" <td>840</td>\n",
" <td>N171US</td>\n",
" <td>SFO</td>\n",
" <td>CLT</td>\n",
" <td>2015-01-01 00:20:00</td>\n",
" <td>...</td>\n",
" <td>http://www.charlotteairport.com/</td>\n",
" <td>America/New_York</td>\n",
" <td>CLT</td>\n",
" <td>US</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" <td>2015-01-01 00:20:00-08:00</td>\n",
" <td>2015-01-01 08:06:00-05:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AA</td>\n",
" <td>258</td>\n",
" <td>N3HYAA</td>\n",
" <td>LAX</td>\n",
" <td>MIA</td>\n",
" <td>2015-01-01 00:20:00</td>\n",
" <td>...</td>\n",
" <td>http://www.miami-airport.com/</td>\n",
" <td>America/New_York</td>\n",
" <td>MIA</td>\n",
" <td>US</td>\n",
" <td>Virginia Gardens</td>\n",
" <td>Florida</td>\n",
" <td>Miami-Dade County</td>\n",
" <td>AP</td>\n",
" <td>2015-01-01 00:20:00-08:00</td>\n",
" <td>2015-01-01 08:05:00-05:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AS</td>\n",
" <td>135</td>\n",
" <td>N527AS</td>\n",
" <td>SEA</td>\n",
" <td>ANC</td>\n",
" <td>2015-01-01 00:25:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>America/Anchorage</td>\n",
" <td>ANC</td>\n",
" <td>US</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" <td>2015-01-01 00:25:00-08:00</td>\n",
" <td>2015-01-01 03:20:00-09:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332909</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>688</td>\n",
" <td>N657JB</td>\n",
" <td>LAX</td>\n",
" <td>BOS</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>http://www.massport.com/logan/</td>\n",
" <td>America/New_York</td>\n",
" <td>BOS</td>\n",
" <td>US</td>\n",
" <td>Chelsea</td>\n",
" <td>Massachusetts</td>\n",
" <td>Suffolk County</td>\n",
" <td>AP</td>\n",
" <td>2015-12-31 23:59:00-08:00</td>\n",
" <td>2016-01-01 08:19:00-05:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332910</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>745</td>\n",
" <td>N828JB</td>\n",
" <td>JFK</td>\n",
" <td>PSE</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>America/Puerto_Rico</td>\n",
" <td>PSE</td>\n",
" <td>PR</td>\n",
" <td>Aguilita</td>\n",
" <td>Juana Diaz</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" <td>2015-12-31 23:59:00-05:00</td>\n",
" <td>2016-01-01 04:46:00-04:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332911</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>1503</td>\n",
" <td>N913JB</td>\n",
" <td>JFK</td>\n",
" <td>SJU</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>America/Puerto_Rico</td>\n",
" <td>SJU</td>\n",
" <td>PR</td>\n",
" <td>Carolina</td>\n",
" <td>Carolina</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" <td>2015-12-31 23:59:00-05:00</td>\n",
" <td>2016-01-01 04:40:00-04:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332912</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>333</td>\n",
" <td>N527JB</td>\n",
" <td>MCO</td>\n",
" <td>SJU</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>America/Puerto_Rico</td>\n",
" <td>SJU</td>\n",
" <td>PR</td>\n",
" <td>Carolina</td>\n",
" <td>Carolina</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" <td>2015-12-31 23:59:00-05:00</td>\n",
" <td>2016-01-01 03:40:00-04:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332913</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>839</td>\n",
" <td>N534JB</td>\n",
" <td>JFK</td>\n",
" <td>BQN</td>\n",
" <td>2015-12-31 23:59:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>America/Puerto_Rico</td>\n",
" <td>BQN</td>\n",
" <td>PR</td>\n",
" <td>San Antonio</td>\n",
" <td>Aguadilla</td>\n",
" <td>NaN</td>\n",
" <td>AP</td>\n",
" <td>2015-12-31 23:59:00-05:00</td>\n",
" <td>2016-01-01 04:40:00-04:00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5332911 rows × 61 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-9de7a4a3-81d7-4bbe-85c9-399367b26011')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-9de7a4a3-81d7-4bbe-85c9-399367b26011 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-9de7a4a3-81d7-4bbe-85c9-399367b26011');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-8ff45eec-c3c4-4e70-b4c2-4c56b312999b\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8ff45eec-c3c4-4e70-b4c2-4c56b312999b')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-8ff45eec-c3c4-4e70-b4c2-4c56b312999b button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" <div id=\"id_892417f6-5569-48bd-98af-54c5f0d04743\">\n",
" <style>\n",
" .colab-df-generate {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-generate:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
" <button class=\"colab-df-generate\" onclick=\"generateWithVariable('flights')\"\n",
" title=\"Generate code using this dataframe.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
" </svg>\n",
" </button>\n",
" <script>\n",
" (() => {\n",
" const buttonEl =\n",
" document.querySelector('#id_892417f6-5569-48bd-98af-54c5f0d04743 button.colab-df-generate');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" buttonEl.onclick = () => {\n",
" google.colab.notebook.generateWithVariable('flights');\n",
" }\n",
" })();\n",
" </script>\n",
" </div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "flights"
}
},
"metadata": {},
"execution_count": 17
}
]
},
{
"cell_type": "markdown",
"source": [
"#### Import weather data"
],
"metadata": {
"id": "GQ2nmdgBJMoZ"
}
},
{
"cell_type": "markdown",
"source": [
"We want to know what the weather was observed to be at the scheduled departure and arrival time for each flight. The idea is that this will help us predict flight delays with more skill."
],
"metadata": {
"id": "yJiBdfzc14fq"
}
},
{
"cell_type": "markdown",
"source": [
"We're going to use an API to download the weather data we want to combine with the flight data. In order to do that, we need to know what airports we're going to ask the API to give us data for. Let's create that list:"
],
"metadata": {
"id": "pm9YnMW5JOrq"
}
},
{
"cell_type": "code",
"source": [
"# Get all the airport IDs in the dataset\n",
"airport_ids = pd.unique(flights[['ORIGIN_AIRPORT', 'DESTINATION_AIRPORT']].values.ravel())\n",
"airport_ids"
],
"metadata": {
"id": "y7dmLnQOxZNI",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "61f23c9d-f363-4f96-a7a0-df11685038b6",
"collapsed": true
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array(['ANC', 'SEA', 'LAX', 'PBI', 'SFO', 'CLT', 'MIA', 'MSP', 'LAS',\n",
" 'DFW', 'ATL', 'DEN', 'SLC', 'IAH', 'PDX', 'MCI', 'FAI', 'FLL',\n",
" 'PHX', 'ORD', 'HNL', 'SJU', 'EWR', 'JFK', 'PBG', 'IAG', 'PSE',\n",
" 'MCO', 'BQN', 'BOS', 'BDL', 'GEG', 'ITO', 'ONT', 'KOA', 'OGG',\n",
" 'MYR', 'HIB', 'ABR', 'MAF', 'MKE', 'BNA', 'DTW', 'BRO', 'VPS',\n",
" 'BOI', 'BJI', 'LIH', 'SGF', 'PHL', 'SBN', 'RDD', 'EUG', 'IAD',\n",
" 'BUF', 'PWM', 'CRP', 'PIA', 'FAT', 'SMF', 'AUS', 'BWI', 'JAX',\n",
" 'MFR', 'IDA', 'MSN', 'DCA', 'SAT', 'CHS', 'SBA', 'SMX', 'IND',\n",
" 'CLE', 'GSP', 'ABI', 'RIC', 'BFL', 'OMA', 'RDM', 'CID', 'TPA',\n",
" 'SYR', 'ROC', 'TYR', 'LAN', 'XNA', 'GSO', 'LGA', 'RSW', 'OAK',\n",
" 'PVD', 'RNO', 'PIT', 'ABQ', 'HOU', 'TUL', 'LIT', 'MSY', 'OKC',\n",
" 'SJC', 'LGB', 'ATW', 'PNS', 'MEM', 'TYS', 'MHT', 'SAV', 'GRB',\n",
" 'ABE', 'JAN', 'OAJ', 'FAR', 'ERI', 'LEX', 'CWA', 'MSO', 'TTN',\n",
" 'AMA', 'CLL', 'BTR', 'JLN', 'MLI', 'RDU', 'CVG', 'MHK', 'MOB',\n",
" 'TLH', 'BHM', 'CAE', 'TXK', 'ACY', 'RAP', 'TUS', 'EAU', 'DLH',\n",
" 'FSD', 'INL', 'BRD', 'CMX', 'SPI', 'CLD', 'COD', 'CMH', 'LRD',\n",
" 'PSC', 'CPR', 'ACV', 'DAL', 'PAH', 'MRY', 'ESC', 'ISN', 'PSP',\n",
" 'STL', 'MFE', 'BTV', 'FSM', 'AEX', 'SPS', 'ACT', 'SJT', 'MTJ',\n",
" 'GCC', 'RKS', 'MBS', 'GUC', 'ORF', 'MOT', 'MLU', 'SNA', 'SAN',\n",
" 'LAW', 'PIB', 'MEI', 'MGM', 'SBP', 'COS', 'LAR', 'DRO', 'BIS',\n",
" 'MDW', 'GRI', 'HLN', 'BZN', 'MDT', 'SCE', 'TWF', 'OTZ', 'BPT',\n",
" 'GPT', 'STC', 'HPN', 'MLB', 'PLN', 'CIU', 'CAK', 'DSM', 'BLI',\n",
" 'SHV', 'ROW', 'FWA', 'BUR', 'ALB', 'HOB', 'LNK', 'CMI', 'COU',\n",
" 'GTF', 'EKO', 'AVL', 'HSV', 'SAF', 'GRR', 'SUX', 'LFT', 'HYS',\n",
" 'ELP', 'DVL', 'JMS', 'ISP', 'DAB', 'DAY', 'GRK', 'GJT', 'BMI',\n",
" 'LBE', 'ASE', 'GUM', 'SCC', 'TVC', 'ALO', 'STT', 'IMT', 'RHI',\n",
" 'JNU', 'LCH', 'KTN', 'JAC', 'DBQ', 'GCK', 'GNV', 'DIK', 'SDF',\n",
" 'LBB', 'AVP', 'BTM', 'SRQ', 'ELM', 'PIH', 'ICT', 'SUN', 'LWS',\n",
" 'VEL', 'PUB', 'HRL', 'HDN', 'ORH', 'YUM', 'FLG', 'FCA', 'BIL',\n",
" 'ROA', 'CHA', 'EYW', 'CRW', 'MQT', 'CHO', 'CDC', 'EGE', 'APN',\n",
" 'ECP', 'MMH', 'CEC', 'EVV', 'GFK', 'TOL', 'AGS', 'STX', 'ILM',\n",
" 'WRG', 'FNT', 'CNY', 'BRW', 'MKG', 'OME', 'GGG', 'LSE', 'SIT',\n",
" 'FAY', 'DHN', 'PSG', 'SGU', 'PHF', 'BET', 'AZO', 'RST', 'TRI',\n",
" 'VLD', 'BQK', 'SWF', 'CSG', 'EWN', 'GTR', 'ILG', 'ABY', 'ADK',\n",
" 'UST', 'YAK', 'CDV', 'OTH', 'ADQ', 'PPG', 'BGM', 'BGR', 'ITH',\n",
" 'ACK', 'MVY', 'WYS', 'DLG', 'AKN', 'GST', 'HYA'], dtype=object)"
]
},
"metadata": {},
"execution_count": 18
}
]
},
{
"cell_type": "code",
"source": [
"# Remember, some airport IDs were five-digit identifiers, and we want to drop those\n",
"airport_ids = [id for id in list(airport_ids) if not any(char.isdigit() for char in id)]\n",
"print(airport_ids)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "v6NMNVTQ4OC7",
"outputId": "c6a120d7-4028-4500-9389-b87bb3b4521a"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"['ANC', 'SEA', 'LAX', 'PBI', 'SFO', 'CLT', 'MIA', 'MSP', 'LAS', 'DFW', 'ATL', 'DEN', 'SLC', 'IAH', 'PDX', 'MCI', 'FAI', 'FLL', 'PHX', 'ORD', 'HNL', 'SJU', 'EWR', 'JFK', 'PBG', 'IAG', 'PSE', 'MCO', 'BQN', 'BOS', 'BDL', 'GEG', 'ITO', 'ONT', 'KOA', 'OGG', 'MYR', 'HIB', 'ABR', 'MAF', 'MKE', 'BNA', 'DTW', 'BRO', 'VPS', 'BOI', 'BJI', 'LIH', 'SGF', 'PHL', 'SBN', 'RDD', 'EUG', 'IAD', 'BUF', 'PWM', 'CRP', 'PIA', 'FAT', 'SMF', 'AUS', 'BWI', 'JAX', 'MFR', 'IDA', 'MSN', 'DCA', 'SAT', 'CHS', 'SBA', 'SMX', 'IND', 'CLE', 'GSP', 'ABI', 'RIC', 'BFL', 'OMA', 'RDM', 'CID', 'TPA', 'SYR', 'ROC', 'TYR', 'LAN', 'XNA', 'GSO', 'LGA', 'RSW', 'OAK', 'PVD', 'RNO', 'PIT', 'ABQ', 'HOU', 'TUL', 'LIT', 'MSY', 'OKC', 'SJC', 'LGB', 'ATW', 'PNS', 'MEM', 'TYS', 'MHT', 'SAV', 'GRB', 'ABE', 'JAN', 'OAJ', 'FAR', 'ERI', 'LEX', 'CWA', 'MSO', 'TTN', 'AMA', 'CLL', 'BTR', 'JLN', 'MLI', 'RDU', 'CVG', 'MHK', 'MOB', 'TLH', 'BHM', 'CAE', 'TXK', 'ACY', 'RAP', 'TUS', 'EAU', 'DLH', 'FSD', 'INL', 'BRD', 'CMX', 'SPI', 'CLD', 'COD', 'CMH', 'LRD', 'PSC', 'CPR', 'ACV', 'DAL', 'PAH', 'MRY', 'ESC', 'ISN', 'PSP', 'STL', 'MFE', 'BTV', 'FSM', 'AEX', 'SPS', 'ACT', 'SJT', 'MTJ', 'GCC', 'RKS', 'MBS', 'GUC', 'ORF', 'MOT', 'MLU', 'SNA', 'SAN', 'LAW', 'PIB', 'MEI', 'MGM', 'SBP', 'COS', 'LAR', 'DRO', 'BIS', 'MDW', 'GRI', 'HLN', 'BZN', 'MDT', 'SCE', 'TWF', 'OTZ', 'BPT', 'GPT', 'STC', 'HPN', 'MLB', 'PLN', 'CIU', 'CAK', 'DSM', 'BLI', 'SHV', 'ROW', 'FWA', 'BUR', 'ALB', 'HOB', 'LNK', 'CMI', 'COU', 'GTF', 'EKO', 'AVL', 'HSV', 'SAF', 'GRR', 'SUX', 'LFT', 'HYS', 'ELP', 'DVL', 'JMS', 'ISP', 'DAB', 'DAY', 'GRK', 'GJT', 'BMI', 'LBE', 'ASE', 'GUM', 'SCC', 'TVC', 'ALO', 'STT', 'IMT', 'RHI', 'JNU', 'LCH', 'KTN', 'JAC', 'DBQ', 'GCK', 'GNV', 'DIK', 'SDF', 'LBB', 'AVP', 'BTM', 'SRQ', 'ELM', 'PIH', 'ICT', 'SUN', 'LWS', 'VEL', 'PUB', 'HRL', 'HDN', 'ORH', 'YUM', 'FLG', 'FCA', 'BIL', 'ROA', 'CHA', 'EYW', 'CRW', 'MQT', 'CHO', 'CDC', 'EGE', 'APN', 'ECP', 'MMH', 'CEC', 'EVV', 'GFK', 'TOL', 'AGS', 'STX', 'ILM', 'WRG', 'FNT', 'CNY', 'BRW', 'MKG', 'OME', 'GGG', 'LSE', 'SIT', 'FAY', 'DHN', 'PSG', 'SGU', 'PHF', 'BET', 'AZO', 'RST', 'TRI', 'VLD', 'BQK', 'SWF', 'CSG', 'EWN', 'GTR', 'ILG', 'ABY', 'ADK', 'UST', 'YAK', 'CDV', 'OTH', 'ADQ', 'PPG', 'BGM', 'BGR', 'ITH', 'ACK', 'MVY', 'WYS', 'DLG', 'AKN', 'GST', 'HYA']\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"We'll be asking the API for all weather observations from 2015 across 322 airports!"
],
"metadata": {
"id": "1bS0hxNtKwwE"
}
},
{
"cell_type": "code",
"source": [
"len(airport_ids)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "StQaJyuQ99Rh",
"outputId": "be72604b-8667-4def-ce71-35656afe610c"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"322"
]
},
"metadata": {},
"execution_count": 20
}
]
},
{
"cell_type": "markdown",
"source": [
"Now we'll create the string of airports to send to the API."
],
"metadata": {
"id": "U9zGVPVEK4j0"
}
},
{
"cell_type": "code",
"source": [
"station_str = ''\n",
"for id in airport_ids:\n",
" station_str += f'station={id}&'\n",
"print(station_str)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NbLw3NCY-klK",
"outputId": "e55214c2-9bb5-4674-ec31-22d7f9b29caf"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"station=ANC&station=SEA&station=LAX&station=PBI&station=SFO&station=CLT&station=MIA&station=MSP&station=LAS&station=DFW&station=ATL&station=DEN&station=SLC&station=IAH&station=PDX&station=MCI&station=FAI&station=FLL&station=PHX&station=ORD&station=HNL&station=SJU&station=EWR&station=JFK&station=PBG&station=IAG&station=PSE&station=MCO&station=BQN&station=BOS&station=BDL&station=GEG&station=ITO&station=ONT&station=KOA&station=OGG&station=MYR&station=HIB&station=ABR&station=MAF&station=MKE&station=BNA&station=DTW&station=BRO&station=VPS&station=BOI&station=BJI&station=LIH&station=SGF&station=PHL&station=SBN&station=RDD&station=EUG&station=IAD&station=BUF&station=PWM&station=CRP&station=PIA&station=FAT&station=SMF&station=AUS&station=BWI&station=JAX&station=MFR&station=IDA&station=MSN&station=DCA&station=SAT&station=CHS&station=SBA&station=SMX&station=IND&station=CLE&station=GSP&station=ABI&station=RIC&station=BFL&station=OMA&station=RDM&station=CID&station=TPA&station=SYR&station=ROC&station=TYR&station=LAN&station=XNA&station=GSO&station=LGA&station=RSW&station=OAK&station=PVD&station=RNO&station=PIT&station=ABQ&station=HOU&station=TUL&station=LIT&station=MSY&station=OKC&station=SJC&station=LGB&station=ATW&station=PNS&station=MEM&station=TYS&station=MHT&station=SAV&station=GRB&station=ABE&station=JAN&station=OAJ&station=FAR&station=ERI&station=LEX&station=CWA&station=MSO&station=TTN&station=AMA&station=CLL&station=BTR&station=JLN&station=MLI&station=RDU&station=CVG&station=MHK&station=MOB&station=TLH&station=BHM&station=CAE&station=TXK&station=ACY&station=RAP&station=TUS&station=EAU&station=DLH&station=FSD&station=INL&station=BRD&station=CMX&station=SPI&station=CLD&station=COD&station=CMH&station=LRD&station=PSC&station=CPR&station=ACV&station=DAL&station=PAH&station=MRY&station=ESC&station=ISN&station=PSP&station=STL&station=MFE&station=BTV&station=FSM&station=AEX&station=SPS&station=ACT&station=SJT&station=MTJ&station=GCC&station=RKS&station=MBS&station=GUC&station=ORF&station=MOT&station=MLU&station=SNA&station=SAN&station=LAW&station=PIB&station=MEI&station=MGM&station=SBP&station=COS&station=LAR&station=DRO&station=BIS&station=MDW&station=GRI&station=HLN&station=BZN&station=MDT&station=SCE&station=TWF&station=OTZ&station=BPT&station=GPT&station=STC&station=HPN&station=MLB&station=PLN&station=CIU&station=CAK&station=DSM&station=BLI&station=SHV&station=ROW&station=FWA&station=BUR&station=ALB&station=HOB&station=LNK&station=CMI&station=COU&station=GTF&station=EKO&station=AVL&station=HSV&station=SAF&station=GRR&station=SUX&station=LFT&station=HYS&station=ELP&station=DVL&station=JMS&station=ISP&station=DAB&station=DAY&station=GRK&station=GJT&station=BMI&station=LBE&station=ASE&station=GUM&station=SCC&station=TVC&station=ALO&station=STT&station=IMT&station=RHI&station=JNU&station=LCH&station=KTN&station=JAC&station=DBQ&station=GCK&station=GNV&station=DIK&station=SDF&station=LBB&station=AVP&station=BTM&station=SRQ&station=ELM&station=PIH&station=ICT&station=SUN&station=LWS&station=VEL&station=PUB&station=HRL&station=HDN&station=ORH&station=YUM&station=FLG&station=FCA&station=BIL&station=ROA&station=CHA&station=EYW&station=CRW&station=MQT&station=CHO&station=CDC&station=EGE&station=APN&station=ECP&station=MMH&station=CEC&station=EVV&station=GFK&station=TOL&station=AGS&station=STX&station=ILM&station=WRG&station=FNT&station=CNY&station=BRW&station=MKG&station=OME&station=GGG&station=LSE&station=SIT&station=FAY&station=DHN&station=PSG&station=SGU&station=PHF&station=BET&station=AZO&station=RST&station=TRI&station=VLD&station=BQK&station=SWF&station=CSG&station=EWN&station=GTR&station=ILG&station=ABY&station=ADK&station=UST&station=YAK&station=CDV&station=OTH&station=ADQ&station=PPG&station=BGM&station=BGR&station=ITH&station=ACK&station=MVY&station=WYS&station=DLG&station=AKN&station=GST&station=HYA&\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Our API call will return a CSV that will be read into Pandas."
],
"metadata": {
"id": "k11WGpS_K-b0"
}
},
{
"cell_type": "code",
"source": [
"weather_data_path = f'https://mesonet.agron.iastate.edu/cgi-bin/request/asos.py?{station_str}missing=null&year1=2015&month1=1&day1=1&year2=2016&month2=1&day2=1'\n",
"weather = pd.read_csv(weather_data_path)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "F4t-ljDY9dbA",
"outputId": "0c636c61-7c0d-453e-9049-f5797b83130c"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"<ipython-input-22-5e2ea01c8cb1>:2: DtypeWarning: Columns (26) have mixed types. Specify dtype option on import or set low_memory=False.\n",
" weather = pd.read_csv(weather_data_path)\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"weather.head()"
],
"metadata": {
"id": "KyEjdph7Ajei",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 863
},
"outputId": "4655e7c1-8095-42e7-aa2b-dacb6b146bd4"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" station valid tmpf dwpf relh drct sknt p01i alti \\\n",
"0 GRI 2015-01-01 00:00 9.0 0.0 66.16 240.0 14.00 NaN NaN \n",
"1 LWS 2015-01-01 00:00 22.0 8.0 54.28 0.0 0.00 NaN NaN \n",
"2 MIA 2015-01-01 00:00 75.0 66.0 73.59 330.0 5.00 NaN NaN \n",
"3 STL 2015-01-01 00:00 25.0 5.0 41.76 250.0 4.08 NaN NaN \n",
"4 DFW 2015-01-01 00:00 33.0 23.0 66.32 10.0 11.00 NaN NaN \n",
"\n",
" mslp ... wxcodes ice_accretion_1hr ice_accretion_3hr \\\n",
"0 1030.3 ... NaN NaN NaN \n",
"1 1038.7 ... NaN NaN NaN \n",
"2 1021.3 ... NaN NaN NaN \n",
"3 1032.3 ... NaN NaN NaN \n",
"4 1031.7 ... NaN NaN NaN \n",
"\n",
" ice_accretion_6hr peak_wind_gust peak_wind_drct peak_wind_time feel \\\n",
"0 NaN NaN NaN NaN -8.48 \n",
"1 NaN NaN NaN NaN 22.00 \n",
"2 NaN NaN NaN NaN 75.00 \n",
"3 NaN NaN NaN NaN 19.14 \n",
"4 NaN NaN NaN NaN 23.73 \n",
"\n",
" metar snowdepth \n",
"0 METAR KGRI 010000Z AUTO 24014KT M13/M18 RMK AO... NaN \n",
"1 METAR KLWS 010000Z AUTO 00000KT M06/M13 RMK AO... NaN \n",
"2 METAR KMIA 010000Z AUTO 33005KT 24/19 RMK AO2 ... NaN \n",
"3 METAR KSTL 010000Z AUTO 25004KT M04/M15 RMK AO... NaN \n",
"4 METAR KDFW 010000Z AUTO 01011KT 01/M05 RMK AO2... NaN \n",
"\n",
"[5 rows x 30 columns]"
],
"text/html": [
"\n",
" <div id=\"df-e1ea6bba-2442-49ce-ac18-b6248d5476db\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>station</th>\n",
" <th>valid</th>\n",
" <th>tmpf</th>\n",
" <th>dwpf</th>\n",
" <th>relh</th>\n",
" <th>drct</th>\n",
" <th>sknt</th>\n",
" <th>p01i</th>\n",
" <th>alti</th>\n",
" <th>mslp</th>\n",
" <th>...</th>\n",
" <th>wxcodes</th>\n",
" <th>ice_accretion_1hr</th>\n",
" <th>ice_accretion_3hr</th>\n",
" <th>ice_accretion_6hr</th>\n",
" <th>peak_wind_gust</th>\n",
" <th>peak_wind_drct</th>\n",
" <th>peak_wind_time</th>\n",
" <th>feel</th>\n",
" <th>metar</th>\n",
" <th>snowdepth</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>GRI</td>\n",
" <td>2015-01-01 00:00</td>\n",
" <td>9.0</td>\n",
" <td>0.0</td>\n",
" <td>66.16</td>\n",
" <td>240.0</td>\n",
" <td>14.00</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1030.3</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>-8.48</td>\n",
" <td>METAR KGRI 010000Z AUTO 24014KT M13/M18 RMK AO...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LWS</td>\n",
" <td>2015-01-01 00:00</td>\n",
" <td>22.0</td>\n",
" <td>8.0</td>\n",
" <td>54.28</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1038.7</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>22.00</td>\n",
" <td>METAR KLWS 010000Z AUTO 00000KT M06/M13 RMK AO...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>MIA</td>\n",
" <td>2015-01-01 00:00</td>\n",
" <td>75.0</td>\n",
" <td>66.0</td>\n",
" <td>73.59</td>\n",
" <td>330.0</td>\n",
" <td>5.00</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1021.3</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>75.00</td>\n",
" <td>METAR KMIA 010000Z AUTO 33005KT 24/19 RMK AO2 ...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>STL</td>\n",
" <td>2015-01-01 00:00</td>\n",
" <td>25.0</td>\n",
" <td>5.0</td>\n",
" <td>41.76</td>\n",
" <td>250.0</td>\n",
" <td>4.08</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1032.3</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>19.14</td>\n",
" <td>METAR KSTL 010000Z AUTO 25004KT M04/M15 RMK AO...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>DFW</td>\n",
" <td>2015-01-01 00:00</td>\n",
" <td>33.0</td>\n",
" <td>23.0</td>\n",
" <td>66.32</td>\n",
" <td>10.0</td>\n",
" <td>11.00</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1031.7</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>23.73</td>\n",
" <td>METAR KDFW 010000Z AUTO 01011KT 01/M05 RMK AO2...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 30 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e1ea6bba-2442-49ce-ac18-b6248d5476db')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-e1ea6bba-2442-49ce-ac18-b6248d5476db button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-e1ea6bba-2442-49ce-ac18-b6248d5476db');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-00432a67-a94b-4a8c-8fbb-ed8032c583b9\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-00432a67-a94b-4a8c-8fbb-ed8032c583b9')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-00432a67-a94b-4a8c-8fbb-ed8032c583b9 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "weather"
}
},
"metadata": {},
"execution_count": 23
}
]
},
{
"cell_type": "markdown",
"source": [
"Now, we'll convert the valid time for the weather observations into datetime objects. All of these times are in UTC, which is much easier to work with than local timezones!"
],
"metadata": {
"id": "8AoTP_0X20Aj"
}
},
{
"cell_type": "code",
"source": [
"weather['valid'] = pd.to_datetime(weather['valid'], utc=True)\n",
"weather['valid']"
],
"metadata": {
"id": "zb7qB3P-DIc1",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 458
},
"outputId": "2aeba4b2-5e46-4443-c2de-663c99af51ba"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 2015-01-01 00:00:00+00:00\n",
"1 2015-01-01 00:00:00+00:00\n",
"2 2015-01-01 00:00:00+00:00\n",
"3 2015-01-01 00:00:00+00:00\n",
"4 2015-01-01 00:00:00+00:00\n",
" ... \n",
"3351718 2015-12-31 23:56:00+00:00\n",
"3351719 2015-12-31 23:57:00+00:00\n",
"3351720 2015-12-31 23:58:00+00:00\n",
"3351721 2015-12-31 23:58:00+00:00\n",
"3351722 2015-12-31 23:59:00+00:00\n",
"Name: valid, Length: 3351723, dtype: datetime64[ns, UTC]"
],
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>valid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2015-01-01 00:00:00+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2015-01-01 00:00:00+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2015-01-01 00:00:00+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2015-01-01 00:00:00+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015-01-01 00:00:00+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3351718</th>\n",
" <td>2015-12-31 23:56:00+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3351719</th>\n",
" <td>2015-12-31 23:57:00+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3351720</th>\n",
" <td>2015-12-31 23:58:00+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3351721</th>\n",
" <td>2015-12-31 23:58:00+00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3351722</th>\n",
" <td>2015-12-31 23:59:00+00:00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3351723 rows × 1 columns</p>\n",
"</div><br><label><b>dtype:</b> datetime64[ns, UTC]</label>"
]
},
"metadata": {},
"execution_count": 24
}
]
},
{
"cell_type": "markdown",
"source": [
"#### Join flight and weather tables together"
],
"metadata": {
"id": "X9t9fL7tvRmX"
}
},
{
"cell_type": "markdown",
"source": [
"Ok, now it's time to merge the flight data with our weather observations. Remember our weather data is in UTC, but our flight departure and arrival times are in local timezones. We need them all in the same timezone, so we'll convert all our local times to UTC for all flights."
],
"metadata": {
"id": "X_ODtdBb3Mqc"
}
},
{
"cell_type": "code",
"source": [
"flights['SCHEDULED_DEPARTURE_UTC'] = pd.to_datetime(flights['SCHEDULED_DEPARTURE_tz_aware'], utc=True)\n",
"flights['SCHEDULED_ARRIVAL_UTC'] = pd.to_datetime(flights['SCHEDULED_ARRIVAL_tz_aware'], utc=True)"
],
"metadata": {
"id": "hsTDcBc_COTS",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "49809705-7111-4f17-fcfd-ec602a2026a1",
"collapsed": true
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"<ipython-input-25-e3afed8e7977>:2: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" flights['SCHEDULED_DEPARTURE_UTC'] = pd.to_datetime(flights['SCHEDULED_DEPARTURE_tz_aware'], utc=True)\n",
"<ipython-input-25-e3afed8e7977>:3: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" flights['SCHEDULED_ARRIVAL_UTC'] = pd.to_datetime(flights['SCHEDULED_ARRIVAL_tz_aware'], utc=True)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Similar to when we merged timezone data into our flights, we need to merge weather data twice, once for departure and once for arrival."
],
"metadata": {
"id": "bIUpUa4C3lxU"
}
},
{
"cell_type": "code",
"source": [
"# Adding a prefix to all the column names from each table will help us differentiate between the columns after the two joins\n",
"origin_weather = weather.add_prefix('origin_')\n",
"destination_weather = weather.add_prefix('destination_')"
],
"metadata": {
"id": "5uyRUgk4y0RO"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Using merge_asof required the tables be sorted\n",
"flights = flights.sort_values('SCHEDULED_DEPARTURE_UTC')\n",
"origin_weather = origin_weather.sort_values('origin_valid')\n",
"\n",
"# Join weather conditions for departure time with all flights\n",
"flights = pd.merge_asof(\n",
" flights,\n",
" origin_weather,\n",
" left_on='SCHEDULED_DEPARTURE_UTC',\n",
" right_on='origin_valid',\n",
" left_by='ORIGIN_AIRPORT',\n",
" right_by='origin_station',\n",
" direction='nearest', # The weather observation closest to scheduled departure is joined\n",
" tolerance=pd.Timedelta('1h') # unless there is no observation within +/- 1 hour of scheduled departure\n",
")"
],
"metadata": {
"id": "J2rCSjZ3DJw0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Using merge_asof required the tables be sorted\n",
"flights = flights.sort_values('SCHEDULED_ARRIVAL_UTC')\n",
"destination_weather = destination_weather.sort_values('destination_valid')\n",
"\n",
"# Join weather conditions for arrival time with all flights\n",
"flights = pd.merge_asof(\n",
" flights,\n",
" destination_weather,\n",
" left_on='SCHEDULED_ARRIVAL_UTC',\n",
" right_on='destination_valid',\n",
" left_by='DESTINATION_AIRPORT',\n",
" right_by='destination_station',\n",
" direction='nearest', # The weather observation closest to scheduled arrival is joined\n",
" tolerance=pd.Timedelta('1h') # unless there is no observation within +/- 1 hour of scheduled arrival\n",
")"
],
"metadata": {
"id": "wcFeyGkfF8gw"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Great, our join is complete! We ended up with some null values, where some flights could not be paired with a weather observation.\n",
"\n",
"It looks like this occured for airports that don't share the same three-letter identifier as their weather station, which is more common in Alaska, Puerto Rico, and the US Pacific islands. Also, if no weather observation was available within +/- 1 hour of scheduled departure or arrival time, no weather observation was added."
],
"metadata": {
"id": "cHo7avaV4QsU"
}
},
{
"cell_type": "code",
"source": [
"flights[flights['origin_metar'].isnull() | flights['destination_metar'].isnull()]"
],
"metadata": {
"id": "Bwy8keiTBOFs",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 652
},
"outputId": "70fd5f19-b6f4-4244-bd1a-99709e627fcb"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER \\\n",
"0 2015 1 1 4 UA 201 N772UA \n",
"1 2015 1 1 4 B6 304 N607JB \n",
"2 2015 1 1 4 B6 668 N653JB \n",
"3 2015 1 1 4 B6 1030 N239JB \n",
"5 2015 1 1 4 UA 1528 N76519 \n",
"... ... ... ... ... ... ... ... \n",
"5332906 2015 12 31 4 EV 5089 N135EV \n",
"5332907 2015 12 31 4 OO 7388 N453SW \n",
"5332908 2015 12 31 4 EV 5308 N872AS \n",
"5332909 2015 12 31 4 DL 1672 N717JL \n",
"5332910 2015 12 31 4 DL 1224 N338NB \n",
"\n",
" ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE ... \\\n",
"0 HNL GUM 2015-01-01 14:50:00 ... \n",
"1 SJU JFK 2015-01-01 01:55:00 ... \n",
"2 PSE MCO 2015-01-01 02:55:00 ... \n",
"3 BQN MCO 2015-01-01 03:07:00 ... \n",
"5 SJU EWR 2015-01-01 01:54:00 ... \n",
"... ... ... ... ... \n",
"5332906 ATL BHM 2015-12-31 18:12:00 ... \n",
"5332907 SLC EKO 2015-12-31 17:00:00 ... \n",
"5332908 ATL MGM 2015-12-31 21:09:00 ... \n",
"5332909 ATL HSV 2015-12-31 21:15:00 ... \n",
"5332910 ATL BHM 2015-12-31 22:12:00 ... \n",
"\n",
" destination_wxcodes destination_ice_accretion_1hr \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 -DZ BR NaN \n",
"3 -DZ BR NaN \n",
"5 NaN NaN \n",
"... ... ... \n",
"5332906 NaN NaN \n",
"5332907 NaN NaN \n",
"5332908 NaN NaN \n",
"5332909 NaN NaN \n",
"5332910 NaN NaN \n",
"\n",
" destination_ice_accretion_3hr destination_ice_accretion_6hr \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"5 NaN NaN \n",
"... ... ... \n",
"5332906 NaN NaN \n",
"5332907 NaN NaN \n",
"5332908 NaN NaN \n",
"5332909 NaN NaN \n",
"5332910 NaN NaN \n",
"\n",
" destination_peak_wind_gust destination_peak_wind_drct \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"5 NaN NaN \n",
"... ... ... \n",
"5332906 NaN NaN \n",
"5332907 NaN NaN \n",
"5332908 NaN NaN \n",
"5332909 NaN NaN \n",
"5332910 NaN NaN \n",
"\n",
" destination_peak_wind_time destination_feel \\\n",
"0 NaN NaN \n",
"1 NaN 18.11 \n",
"2 NaN 60.08 \n",
"3 NaN 60.08 \n",
"5 NaN 24.08 \n",
"... ... ... \n",
"5332906 NaN NaN \n",
"5332907 NaN NaN \n",
"5332908 NaN NaN \n",
"5332909 NaN NaN \n",
"5332910 NaN NaN \n",
"\n",
" destination_metar \\\n",
"0 NaN \n",
"1 KJFK 010951Z 27012G20KT 10SM CLR M02/M14 A3016... \n",
"2 KMCO 010953Z 35009KT 5SM -DZ BR OVC011 16/14 A... \n",
"3 KMCO 010953Z 35009KT 5SM -DZ BR OVC011 16/14 A... \n",
"5 KEWR 010951Z 00000KT 10SM CLR M04/M13 A3015 RM... \n",
"... ... \n",
"5332906 NaN \n",
"5332907 NaN \n",
"5332908 NaN \n",
"5332909 NaN \n",
"5332910 NaN \n",
"\n",
" destination_snowdepth \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"5 NaN \n",
"... ... \n",
"5332906 NaN \n",
"5332907 NaN \n",
"5332908 NaN \n",
"5332909 NaN \n",
"5332910 NaN \n",
"\n",
"[254681 rows x 123 columns]"
],
"text/html": [
"\n",
" <div id=\"df-bb2835a1-c854-4596-9b50-8bff55502804\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>YEAR</th>\n",
" <th>MONTH</th>\n",
" <th>DAY</th>\n",
" <th>DAY_OF_WEEK</th>\n",
" <th>AIRLINE</th>\n",
" <th>FLIGHT_NUMBER</th>\n",
" <th>TAIL_NUMBER</th>\n",
" <th>ORIGIN_AIRPORT</th>\n",
" <th>DESTINATION_AIRPORT</th>\n",
" <th>SCHEDULED_DEPARTURE</th>\n",
" <th>...</th>\n",
" <th>destination_wxcodes</th>\n",
" <th>destination_ice_accretion_1hr</th>\n",
" <th>destination_ice_accretion_3hr</th>\n",
" <th>destination_ice_accretion_6hr</th>\n",
" <th>destination_peak_wind_gust</th>\n",
" <th>destination_peak_wind_drct</th>\n",
" <th>destination_peak_wind_time</th>\n",
" <th>destination_feel</th>\n",
" <th>destination_metar</th>\n",
" <th>destination_snowdepth</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>UA</td>\n",
" <td>201</td>\n",
" <td>N772UA</td>\n",
" <td>HNL</td>\n",
" <td>GUM</td>\n",
" <td>2015-01-01 14:50:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>304</td>\n",
" <td>N607JB</td>\n",
" <td>SJU</td>\n",
" <td>JFK</td>\n",
" <td>2015-01-01 01:55:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>18.11</td>\n",
" <td>KJFK 010951Z 27012G20KT 10SM CLR M02/M14 A3016...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>668</td>\n",
" <td>N653JB</td>\n",
" <td>PSE</td>\n",
" <td>MCO</td>\n",
" <td>2015-01-01 02:55:00</td>\n",
" <td>...</td>\n",
" <td>-DZ BR</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>60.08</td>\n",
" <td>KMCO 010953Z 35009KT 5SM -DZ BR OVC011 16/14 A...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>B6</td>\n",
" <td>1030</td>\n",
" <td>N239JB</td>\n",
" <td>BQN</td>\n",
" <td>MCO</td>\n",
" <td>2015-01-01 03:07:00</td>\n",
" <td>...</td>\n",
" <td>-DZ BR</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>60.08</td>\n",
" <td>KMCO 010953Z 35009KT 5SM -DZ BR OVC011 16/14 A...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>UA</td>\n",
" <td>1528</td>\n",
" <td>N76519</td>\n",
" <td>SJU</td>\n",
" <td>EWR</td>\n",
" <td>2015-01-01 01:54:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>24.08</td>\n",
" <td>KEWR 010951Z 00000KT 10SM CLR M04/M13 A3015 RM...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332906</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>EV</td>\n",
" <td>5089</td>\n",
" <td>N135EV</td>\n",
" <td>ATL</td>\n",
" <td>BHM</td>\n",
" <td>2015-12-31 18:12:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332907</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>OO</td>\n",
" <td>7388</td>\n",
" <td>N453SW</td>\n",
" <td>SLC</td>\n",
" <td>EKO</td>\n",
" <td>2015-12-31 17:00:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332908</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>EV</td>\n",
" <td>5308</td>\n",
" <td>N872AS</td>\n",
" <td>ATL</td>\n",
" <td>MGM</td>\n",
" <td>2015-12-31 21:09:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332909</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>DL</td>\n",
" <td>1672</td>\n",
" <td>N717JL</td>\n",
" <td>ATL</td>\n",
" <td>HSV</td>\n",
" <td>2015-12-31 21:15:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5332910</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>DL</td>\n",
" <td>1224</td>\n",
" <td>N338NB</td>\n",
" <td>ATL</td>\n",
" <td>BHM</td>\n",
" <td>2015-12-31 22:12:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>254681 rows × 123 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-bb2835a1-c854-4596-9b50-8bff55502804')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-bb2835a1-c854-4596-9b50-8bff55502804 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-bb2835a1-c854-4596-9b50-8bff55502804');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-33c5470f-79c6-4514-b073-4c9b9c8b5d5d\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-33c5470f-79c6-4514-b073-4c9b9c8b5d5d')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-33c5470f-79c6-4514-b073-4c9b9c8b5d5d button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe"
}
},
"metadata": {},
"execution_count": 30
}
]
},
{
"cell_type": "markdown",
"source": [
"Since the weather conditions are expected to be a key part of our analysis, we'll drop the columns that we're unable to be matched with a weather observation (more technically called a METAR report)."
],
"metadata": {
"id": "eBdGetXT5I3U"
}
},
{
"cell_type": "code",
"source": [
"# Drop columns where we don't have weather conditions for departure or arrival\n",
"flights = flights.dropna(subset=['origin_metar', 'destination_metar'])"
],
"metadata": {
"id": "nhPpgZg9DUTF"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now, we've combined all of our datasets we plan to use in this project, and we can move forward with more exploration, cleaning, and modeling. Here's a final look at our flights dataset:"
],
"metadata": {
"id": "-9OjRGT76ao1"
}
},
{
"cell_type": "code",
"source": [
"flights"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 964
},
"id": "yPMe3XUHTCmO",
"outputId": "b5c345de-58de-414a-f140-3a29fc563864"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER \\\n",
"4 2015 1 1 4 NK 647 N630NK \n",
"7 2015 1 1 4 NK 451 N633NK \n",
"8 2015 1 1 4 DL 2336 N958DN \n",
"9 2015 1 1 4 NK 597 N528NK \n",
"11 2015 1 1 4 AA 2459 N3BDAA \n",
"... ... ... ... ... ... ... ... \n",
"5329457 2015 12 31 4 UA 1973 N487UA \n",
"5329469 2015 12 31 4 WN 1159 N224WN \n",
"5329472 2015 12 31 4 WN 5330 N8643A \n",
"5329483 2015 12 31 4 WN 5197 N643SW \n",
"5329489 2015 12 31 4 F9 661 N906FR \n",
"\n",
" ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE ... \\\n",
"4 IAG FLL 2015-01-01 02:00:00 ... \n",
"7 PBG FLL 2015-01-01 01:55:00 ... \n",
"8 DEN ATL 2015-01-01 00:30:00 ... \n",
"9 MSP FLL 2015-01-01 01:15:00 ... \n",
"11 PHX DFW 2015-01-01 02:00:00 ... \n",
"... ... ... ... ... \n",
"5329457 IAH SFO 2015-12-31 14:32:00 ... \n",
"5329469 BDL LAS 2015-12-31 14:10:00 ... \n",
"5329472 BWI LAS 2015-12-31 14:25:00 ... \n",
"5329483 PHX LAS 2015-12-31 16:50:00 ... \n",
"5329489 DEN SFO 2015-12-31 15:15:00 ... \n",
"\n",
" destination_wxcodes destination_ice_accretion_1hr \\\n",
"4 BR NaN \n",
"7 BR NaN \n",
"8 NaN NaN \n",
"9 NaN NaN \n",
"11 NaN NaN \n",
"... ... ... \n",
"5329457 NaN NaN \n",
"5329469 NaN NaN \n",
"5329472 NaN NaN \n",
"5329483 NaN NaN \n",
"5329489 NaN NaN \n",
"\n",
" destination_ice_accretion_3hr destination_ice_accretion_6hr \\\n",
"4 NaN NaN \n",
"7 NaN NaN \n",
"8 NaN NaN \n",
"9 NaN NaN \n",
"11 NaN NaN \n",
"... ... ... \n",
"5329457 NaN NaN \n",
"5329469 NaN NaN \n",
"5329472 NaN NaN \n",
"5329483 NaN NaN \n",
"5329489 NaN NaN \n",
"\n",
" destination_peak_wind_gust destination_peak_wind_drct \\\n",
"4 NaN NaN \n",
"7 NaN NaN \n",
"8 NaN NaN \n",
"9 NaN NaN \n",
"11 NaN NaN \n",
"... ... ... \n",
"5329457 NaN NaN \n",
"5329469 NaN NaN \n",
"5329472 NaN NaN \n",
"5329483 NaN NaN \n",
"5329489 NaN NaN \n",
"\n",
" destination_peak_wind_time destination_feel \\\n",
"4 NaN 72.40 \n",
"7 NaN 72.40 \n",
"8 NaN 29.92 \n",
"9 NaN 72.57 \n",
"11 NaN 26.10 \n",
"... ... ... \n",
"5329457 NaN 51.08 \n",
"5329469 NaN 41.35 \n",
"5329472 NaN 41.35 \n",
"5329483 NaN 41.35 \n",
"5329489 NaN 51.08 \n",
"\n",
" destination_metar \\\n",
"4 KFLL 010953Z 36003KT 5SM BR SCT019 BKN055 BKN0... \n",
"7 KFLL 010953Z 36003KT 5SM BR SCT019 BKN055 BKN0... \n",
"8 KATL 011052Z 31003KT 10SM FEW250 01/M03 A3030 ... \n",
"9 KFLL 011053Z 01003KT 7SM SCT017 BKN050 OVC065 ... \n",
"11 KDFW 011053Z 35007KT 10SM OVC043 01/M04 A3033 ... \n",
"... ... \n",
"5329457 KSFO 312356Z 02010KT 10SM FEW050 11/02 A3016 R... \n",
"5329469 KLAS 312356Z 36011KT 10SM CLR 08/M13 A3017 RMK... \n",
"5329472 KLAS 312356Z 36011KT 10SM CLR 08/M13 A3017 RMK... \n",
"5329483 KLAS 312356Z 36011KT 10SM CLR 08/M13 A3017 RMK... \n",
"5329489 KSFO 312356Z 02010KT 10SM FEW050 11/02 A3016 R... \n",
"\n",
" destination_snowdepth \n",
"4 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"11 NaN \n",
"... ... \n",
"5329457 NaN \n",
"5329469 NaN \n",
"5329472 NaN \n",
"5329483 NaN \n",
"5329489 NaN \n",
"\n",
"[5078230 rows x 123 columns]"
],
"text/html": [
"\n",
" <div id=\"df-8283f162-c888-42f3-abf3-4d517748c4eb\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>YEAR</th>\n",
" <th>MONTH</th>\n",
" <th>DAY</th>\n",
" <th>DAY_OF_WEEK</th>\n",
" <th>AIRLINE</th>\n",
" <th>FLIGHT_NUMBER</th>\n",
" <th>TAIL_NUMBER</th>\n",
" <th>ORIGIN_AIRPORT</th>\n",
" <th>DESTINATION_AIRPORT</th>\n",
" <th>SCHEDULED_DEPARTURE</th>\n",
" <th>...</th>\n",
" <th>destination_wxcodes</th>\n",
" <th>destination_ice_accretion_1hr</th>\n",
" <th>destination_ice_accretion_3hr</th>\n",
" <th>destination_ice_accretion_6hr</th>\n",
" <th>destination_peak_wind_gust</th>\n",
" <th>destination_peak_wind_drct</th>\n",
" <th>destination_peak_wind_time</th>\n",
" <th>destination_feel</th>\n",
" <th>destination_metar</th>\n",
" <th>destination_snowdepth</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>NK</td>\n",
" <td>647</td>\n",
" <td>N630NK</td>\n",
" <td>IAG</td>\n",
" <td>FLL</td>\n",
" <td>2015-01-01 02:00:00</td>\n",
" <td>...</td>\n",
" <td>BR</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>72.40</td>\n",
" <td>KFLL 010953Z 36003KT 5SM BR SCT019 BKN055 BKN0...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>NK</td>\n",
" <td>451</td>\n",
" <td>N633NK</td>\n",
" <td>PBG</td>\n",
" <td>FLL</td>\n",
" <td>2015-01-01 01:55:00</td>\n",
" <td>...</td>\n",
" <td>BR</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>72.40</td>\n",
" <td>KFLL 010953Z 36003KT 5SM BR SCT019 BKN055 BKN0...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>DL</td>\n",
" <td>2336</td>\n",
" <td>N958DN</td>\n",
" <td>DEN</td>\n",
" <td>ATL</td>\n",
" <td>2015-01-01 00:30:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>29.92</td>\n",
" <td>KATL 011052Z 31003KT 10SM FEW250 01/M03 A3030 ...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>NK</td>\n",
" <td>597</td>\n",
" <td>N528NK</td>\n",
" <td>MSP</td>\n",
" <td>FLL</td>\n",
" <td>2015-01-01 01:15:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>72.57</td>\n",
" <td>KFLL 011053Z 01003KT 7SM SCT017 BKN050 OVC065 ...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>2015</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>AA</td>\n",
" <td>2459</td>\n",
" <td>N3BDAA</td>\n",
" <td>PHX</td>\n",
" <td>DFW</td>\n",
" <td>2015-01-01 02:00:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>26.10</td>\n",
" <td>KDFW 011053Z 35007KT 10SM OVC043 01/M04 A3033 ...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5329457</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>UA</td>\n",
" <td>1973</td>\n",
" <td>N487UA</td>\n",
" <td>IAH</td>\n",
" <td>SFO</td>\n",
" <td>2015-12-31 14:32:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>51.08</td>\n",
" <td>KSFO 312356Z 02010KT 10SM FEW050 11/02 A3016 R...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5329469</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>WN</td>\n",
" <td>1159</td>\n",
" <td>N224WN</td>\n",
" <td>BDL</td>\n",
" <td>LAS</td>\n",
" <td>2015-12-31 14:10:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>41.35</td>\n",
" <td>KLAS 312356Z 36011KT 10SM CLR 08/M13 A3017 RMK...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5329472</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>WN</td>\n",
" <td>5330</td>\n",
" <td>N8643A</td>\n",
" <td>BWI</td>\n",
" <td>LAS</td>\n",
" <td>2015-12-31 14:25:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>41.35</td>\n",
" <td>KLAS 312356Z 36011KT 10SM CLR 08/M13 A3017 RMK...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5329483</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>WN</td>\n",
" <td>5197</td>\n",
" <td>N643SW</td>\n",
" <td>PHX</td>\n",
" <td>LAS</td>\n",
" <td>2015-12-31 16:50:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>41.35</td>\n",
" <td>KLAS 312356Z 36011KT 10SM CLR 08/M13 A3017 RMK...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5329489</th>\n",
" <td>2015</td>\n",
" <td>12</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>F9</td>\n",
" <td>661</td>\n",
" <td>N906FR</td>\n",
" <td>DEN</td>\n",
" <td>SFO</td>\n",
" <td>2015-12-31 15:15:00</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>51.08</td>\n",
" <td>KSFO 312356Z 02010KT 10SM FEW050 11/02 A3016 R...</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5078230 rows × 123 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-8283f162-c888-42f3-abf3-4d517748c4eb')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-8283f162-c888-42f3-abf3-4d517748c4eb button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-8283f162-c888-42f3-abf3-4d517748c4eb');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-23d7b28c-92a5-4de3-a883-5edd699598f6\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-23d7b28c-92a5-4de3-a883-5edd699598f6')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-23d7b28c-92a5-4de3-a883-5edd699598f6 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" <div id=\"id_b1107e34-658b-4dc3-abae-9e1e884174c1\">\n",
" <style>\n",
" .colab-df-generate {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-generate:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
" <button class=\"colab-df-generate\" onclick=\"generateWithVariable('flights')\"\n",
" title=\"Generate code using this dataframe.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
" </svg>\n",
" </button>\n",
" <script>\n",
" (() => {\n",
" const buttonEl =\n",
" document.querySelector('#id_b1107e34-658b-4dc3-abae-9e1e884174c1 button.colab-df-generate');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" buttonEl.onclick = () => {\n",
" google.colab.notebook.generateWithVariable('flights');\n",
" }\n",
" })();\n",
" </script>\n",
" </div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "flights"
}
},
"metadata": {},
"execution_count": 32
}
]
},
{
"cell_type": "markdown",
"source": [
"We save this dataset as a compressed CSV file."
],
"metadata": {
"id": "uag2RZcY6oM9"
}
},
{
"cell_type": "code",
"source": [
"flights.to_csv('/content/drive/MyDrive/CU Boulder Data Science/DTSA 5506 - Data Mining Project/data/flights_and_weather.csv.gz',\n",
" compression='gzip',\n",
" index=False)"
],
"metadata": {
"id": "Hi4yI8QSTElu"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We'll pick up with this CSV file at the beginning of the second notebook, where we'll cotninue exploring, cleaning, and modeling the data!"
],
"metadata": {
"id": "FRBHbldg6v-F"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment