Skip to content

Instantly share code, notes, and snippets.

@r41k0u
Created March 10, 2024 16:42
Show Gist options
  • Select an option

  • Save r41k0u/72d912b7f07b9881dabfad749cd27acd to your computer and use it in GitHub Desktop.

Select an option

Save r41k0u/72d912b7f07b9881dabfad749cd27acd to your computer and use it in GitHub Desktop.
Data Mining

CSN-515 Data Mining Project

Using data mining to make a recommendation system for choosing an apartment complex to stay in for a new resident given a city

We use this dataset for our data analysis

Requirements

  • Python 3.8
  • required PIP packages (can be installed by python3 -m pip install numpy pandas sklearn tensorflow torch)

Data Cleaning and Transformation

KNN Clustering and recommendation engine

KMeans clustering and recommendation engine

Linear Regressor Pricer

We will use the cleaned housing_dataset.csv (the output of the cleaning script) as the input to our script

  • Keep the housing_dataset.csv in the same folder as the script / Upload it to the Google Colab Python Notebook
  • Run the script to get the R Squared score of the actual and predicted price of the test dataset.
  • Price any arbitrary vector having the following attributes: ['availability', 'size', 'total_sqft', 'bath', 'balcony']
  • Eg: Add the following code at the end, after adding a test_housing_data.csv with vectors having the above attributes:
test_dataset = pd.read_csv('test_housing_data.csv')
print(model.predict(test_dataset))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment