Skip to content

Instantly share code, notes, and snippets.

@TC-Alex
Created June 23, 2020 18:12
Show Gist options
  • Select an option

  • Save TC-Alex/e64aa27857eba24266f1832a399fd12a to your computer and use it in GitHub Desktop.

Select an option

Save TC-Alex/e64aa27857eba24266f1832a399fd12a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{"cells":[{"metadata":{"trusted":true},"cell_type":"code","source":"# -*- coding: utf-8 -*-\n\"\"\"\nCreated on Mon May 18 17:33:58 2020\n\n@author: ryan\n\"\"\"\n\n\n\n### General imports\nimport os\nimport mysql.connector\nimport datetime\n\n### Table generation imports \nfrom entertainment_relations.table_generation.locale_counts import *\nfrom entertainment_relations.table_generation.locale_flags import *\nfrom entertainment_relations.table_generation.table_gen import *\n\n\n### Change this into a connection file\ncreds = os.environ['MYSQL_REPORTING_CREDS'].split('@#$')\n","execution_count":1,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"I changed the creds to standard read-only for the sake of checking out the working results of the notebook."},{"metadata":{"trusted":true},"cell_type":"code","source":"cnx = mysql.connector.connect(host=creds[0],user=creds[1],password=creds[2], database=creds[3])\ncursor = cnx.cursor()\n \n### max_id\nmax_id = 24596035\nbatch_size = 5000\n\ndw_locale_counts = run_login_counts(cursor,max_id)\n\n######################### dw_locale_flags generation \n \nno_tie, tie = resolve_data(dw_locale_counts)\nlatest_tie = fetch_latest_tie(tie, cursor)\n\n### Put generate_fixed_ties inside the function fetch_latest_tie\nfixed_ties = generate_fixed_ties(latest_tie)\ndw_locale_flags = gen_dw_locale_flags(no_tie, fixed_ties)\n\n###################### Table generation part\n\n#remove_tables(cursor)\n#generate_tables(cursor)\n\n## I should change something \n#dw_locale_counts = dw_locale_counts.iloc[:, :-1]\n\n#insert_dw_locale_counts(batch_size,dw_locale_counts,cursor,cnx)\n#insert_dw_locale_flags(batch_size, dw_locale_flags,cursor,cnx)\n#insert_dw_table_updates(max_id, cursor, cnx)\n\ncursor.close()\ncnx.close()\n","execution_count":2,"outputs":[{"output_type":"stream","text":"Altanta\nGeorgia\nAustin\nTexas\nLos Angeles\nCalifornia\nNashville\nTennessee\nNew Orleans\nLouisiana\nNew York City\nConnecticut\nNew Jersey\nNew York\nPennsylvania\n\n Main Query \n\n","name":"stdout"},{"output_type":"stream","text":"D:\\Users\\alex\\Tunecore\\Data Team Share - JIRA Tickets\\Data Warehousing\\DA-506 6_16 v2\\entertainment_relations\\table_generation\\locale_flags.py:46: SettingWithCopyWarning: \nA value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_indexer,col_indexer] = value instead\n\nSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n multi['Max'] = multi.iloc[:,1:].max(axis=1)\nD:\\Users\\alex\\Tunecore\\Data Team Share - JIRA Tickets\\Data Warehousing\\DA-506 6_16 v2\\entertainment_relations\\table_generation\\locale_flags.py:47: SettingWithCopyWarning: \nA value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_indexer,col_indexer] = value instead\n\nSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n multi['Tie'] = 0\nD:\\Users\\alex\\AppData\\Local\\Continuum\\anaconda3\\envs\\py365\\lib\\site-packages\\pandas\\core\\frame.py:4117: SettingWithCopyWarning: \nA value is trying to be set on a copy of a slice from a DataFrame\n\nSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n errors=errors,\nD:\\Users\\alex\\AppData\\Local\\Continuum\\anaconda3\\envs\\py365\\lib\\site-packages\\pandas\\core\\indexing.py:494: SettingWithCopyWarning: \nA value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_indexer,col_indexer] = value instead\n\nSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n self.obj[item] = s\n","name":"stderr"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"dw_locale_counts","execution_count":3,"outputs":[{"output_type":"execute_result","execution_count":3,"data":{"text/plain":" person_id atlanta austin los_angeles nashville new_orleans \\\n0 34 0.0 0.0 0.0 1.0 0.0 \n1 39 0.0 0.0 1.0 187.0 0.0 \n2 41 0.0 0.0 33.0 0.0 0.0 \n3 57 0.0 0.0 0.0 0.0 0.0 \n4 68 0.0 0.0 0.0 0.0 0.0 \n... ... ... ... ... ... ... \n186647 2998159 0.0 0.0 0.0 2.0 0.0 \n186648 2998160 0.0 0.0 1.0 0.0 0.0 \n186649 2998163 0.0 0.0 0.0 0.0 0.0 \n186650 2998190 0.0 0.0 0.0 0.0 0.0 \n186651 2998193 0.0 0.0 0.0 0.0 0.0 \n\n new_york Count \n0 0.0 1 \n1 12.0 3 \n2 2.0 2 \n3 60.0 1 \n4 2.0 1 \n... ... ... \n186647 0.0 1 \n186648 0.0 1 \n186649 3.0 1 \n186650 1.0 1 \n186651 1.0 1 \n\n[186652 rows x 8 columns]","text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>person_id</th>\n <th>atlanta</th>\n <th>austin</th>\n <th>los_angeles</th>\n <th>nashville</th>\n <th>new_orleans</th>\n <th>new_york</th>\n <th>Count</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>34</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>39</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>187.0</td>\n <td>0.0</td>\n <td>12.0</td>\n <td>3</td>\n </tr>\n <tr>\n <th>2</th>\n <td>41</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>33.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>2.0</td>\n <td>2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>57</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>60.0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>68</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>2.0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>186647</th>\n <td>2998159</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>2.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>186648</th>\n <td>2998160</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>186649</th>\n <td>2998163</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>3.0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>186650</th>\n <td>2998190</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>186651</th>\n <td>2998193</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n<p>186652 rows × 8 columns</p>\n</div>"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"Looks like the Count column is responsible for denoting the total number of regions from which that user has ever logged in."},{"metadata":{"trusted":true},"cell_type":"code","source":"print(dw_locale_flags.head(5))\nprint('\\nShape of dw_locale_flags:\\n{0} (rows, columns)'.format(dw_locale_flags.shape))\ndw_locale_flags.groupby('Region').count()","execution_count":14,"outputs":[{"output_type":"stream","text":" Person ID Region\n0 34 nashville\n3 57 new_york\n4 68 new_york\n5 109 los_angeles\n6 129 new_york\n\nShape of dw_locale_flags:\n(144577, 2) (rows, columns)\n","name":"stdout"},{"output_type":"execute_result","execution_count":14,"data":{"text/plain":" Person ID\nRegion \nAtlanta 29\nAustin 3\nLos Angeles 72\nNashville 29\nNew Orleans 5\nNew York City 76\naustin 7009\nlos_angeles 53146\nnashville 12830\nnew_orleans 4059\nnew_york 67319","text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Person ID</th>\n </tr>\n <tr>\n <th>Region</th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Atlanta</th>\n <td>29</td>\n </tr>\n <tr>\n <th>Austin</th>\n <td>3</td>\n </tr>\n <tr>\n <th>Los Angeles</th>\n <td>72</td>\n </tr>\n <tr>\n <th>Nashville</th>\n <td>29</td>\n </tr>\n <tr>\n <th>New Orleans</th>\n <td>5</td>\n </tr>\n <tr>\n <th>New York City</th>\n <td>76</td>\n </tr>\n <tr>\n <th>austin</th>\n <td>7009</td>\n </tr>\n <tr>\n <th>los_angeles</th>\n <td>53146</td>\n </tr>\n <tr>\n <th>nashville</th>\n <td>12830</td>\n </tr>\n <tr>\n <th>new_orleans</th>\n <td>4059</td>\n </tr>\n <tr>\n <th>new_york</th>\n <td>67319</td>\n </tr>\n </tbody>\n</table>\n</div>"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"markdown","source":"At this point, looks like the region field isn't fully cleaned. Also, why are there only 29 accounts in Atlanta?"},{"metadata":{"trusted":true},"cell_type":"code","source":"dw_locale_flags","execution_count":6,"outputs":[{"output_type":"execute_result","execution_count":6,"data":{"text/plain":" Person ID Region\n0 34 nashville\n3 57 new_york\n4 68 new_york\n5 109 los_angeles\n6 129 new_york\n.. ... ...\n209 2703753 New York City\n210 2731452 Los Angeles\n211 2812475 Nashville\n212 2820116 New York City\n213 2909885 Nashville\n\n[144577 rows x 2 columns]","text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Person ID</th>\n <th>Region</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>34</td>\n <td>nashville</td>\n </tr>\n <tr>\n <th>3</th>\n <td>57</td>\n <td>new_york</td>\n </tr>\n <tr>\n <th>4</th>\n <td>68</td>\n <td>new_york</td>\n </tr>\n <tr>\n <th>5</th>\n <td>109</td>\n <td>los_angeles</td>\n </tr>\n <tr>\n <th>6</th>\n <td>129</td>\n <td>new_york</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>209</th>\n <td>2703753</td>\n <td>New York City</td>\n </tr>\n <tr>\n <th>210</th>\n <td>2731452</td>\n <td>Los Angeles</td>\n </tr>\n <tr>\n <th>211</th>\n <td>2812475</td>\n <td>Nashville</td>\n </tr>\n <tr>\n <th>212</th>\n <td>2820116</td>\n <td>New York City</td>\n </tr>\n <tr>\n <th>213</th>\n <td>2909885</td>\n <td>Nashville</td>\n </tr>\n </tbody>\n</table>\n<p>144577 rows × 2 columns</p>\n</div>"},"metadata":{}}]}],"metadata":{"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.6.5","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"toc":{"nav_menu":{},"number_sections":false,"sideBar":false,"skip_h1_title":false,"base_numbering":1,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{},"toc_section_display":false,"toc_window_display":false}},"nbformat":4,"nbformat_minor":4}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment