Created
September 22, 2020 06:30
-
-
Save e-roux/aaf33fbba7deab7466ddbf049b158998 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Week 1\n", | |
| "\n", | |
| "This is week one of the [Applied Data Science Capstone](https://www.coursera.org/learn/applied-data-science-capstone/home/welcome)\n", | |
| "\n", | |
| "From the Introduction:\n", | |
| "\n", | |
| "> Let's use our shared data for\n", | |
| "Seattle city as an example\n", | |
| "of how to deal with the accidents data.\n", | |
| "The same can be applied to\n", | |
| "any data set that you might use for this capstone.\n", | |
| "To choose the right data set for this capstone project,\n", | |
| "please go through the following reading section\n", | |
| "called guidance in finding a data set.\n", | |
| "Let's open the CSV file\n", | |
| "and check what type of data we have.\n", | |
| "The first column colored in yellow is the labeled data.\n", | |
| "The remaining columns have different types of attributes.\n", | |
| "Some or all can be used to train the model.\n", | |
| "You can also find that most of\n", | |
| "the observations are good to\n", | |
| "train and test the machine learning model.\n", | |
| "The label for the data set is severity,\n", | |
| "which describes the fatality of an accident.\n", | |
| "You will notice that the shared data\n", | |
| "has unbalanced labels.\n", | |
| "You should balance the data,\n", | |
| "otherwise, you will create a biased ML model.\n", | |
| "The following is a list of\n", | |
| "attributes or features that you can use.\n", | |
| "For good description of each attribute,\n", | |
| "you can refer to the web link on the CSV file.\n", | |
| "You might need to do some feature engineering\n", | |
| "to improve the predictability of your model.\n", | |
| "You can get the data set from any open source,\n", | |
| "such as Open Government Data portal,\n", | |
| "or any research groups that allow you to use their data.\n", | |
| "Here are some good resources that can help you find\n", | |
| "your data set and start\n", | |
| "building your machine learning model.\n", | |
| "I recommend you go through them carefully. Good luck. " | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import pandas as pd" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import numpy as np" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Hello Capstone Project Course!\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "print(\"Hello Capstone Project Course!\")" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "IBMDataSCience", | |
| "language": "python", | |
| "name": "ibmdatascience" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.8.3" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 4 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment