Last active
July 21, 2023 04:39
-
-
Save zoezhou1999/39fe9cf5fe097baaeb8c5e2621f59681 to your computer and use it in GitHub Desktop.
solution
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "attachments": {}, | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Given an m x n 2D binary grid which represents a map of ‘1’s (land) and ‘0’s (water),\n", | |
| "return the number of islands. An island is surrounded by water and is formed by connecting\n", | |
| "adjacent lands horizontally or vertically. You may assume all four edges of the grid are all \n", | |
| "surrounded by water." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "from collections import deque\n", | |
| "def countIslands(array):\n", | |
| " def bfs(i, j): # use bfs to check islands.\n", | |
| " dirs = [[-1, 0], [1, 0], [0, -1], [0, 1]]\n", | |
| " q = deque([(i, j)])\n", | |
| " while q:\n", | |
| " x, y = q.popleft()\n", | |
| " for dx, dy in dirs:\n", | |
| " xx = x + dx\n", | |
| " yy = y + dy\n", | |
| " if 0 <= xx < len(array) and 0 <= yy < len(array[0]):\n", | |
| " if array[xx][yy] == \"1\":\n", | |
| " array[xx][yy] = \"0\"\n", | |
| " q.append((xx, yy))\n", | |
| " return\n", | |
| "\n", | |
| " n = len(array)\n", | |
| " m = len(array[0])\n", | |
| " cnt = 0\n", | |
| " for i in range(n):\n", | |
| " for j in range(m):\n", | |
| " if array[i][j] == \"1\": # when it is \"1\", it is a island\n", | |
| " array[i][j]=\"0\" # set it to \"0\" to mark as visited\n", | |
| " bfs(i, j)\n", | |
| " cnt += 1\n", | |
| " return cnt\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def editDistance(string1, string2): # dynamic programming\n", | |
| " n=len(string1)\n", | |
| " m=len(string2)\n", | |
| " dp=[[float('inf')]*(m+1) for _ in range(n+1)] # dp[i][j] means the min operations that can make string1[:i] to string2[:j]\n", | |
| " for i in range(m+1):\n", | |
| " dp[0][i]=i\n", | |
| " for i in range(n+1):\n", | |
| " dp[i][0]=i\n", | |
| " for i in range(1,n+1):\n", | |
| " for j in range(1,m+1):\n", | |
| " if string1[i-1]==string2[j-1]: # if chars at i-1 of string1 and j-1 of string2 are equal\n", | |
| " dp[i][j]=min(dp[i-1][j-1], dp[i-1][j]+1, dp[i][j-1]+1)\n", | |
| " else: # not equal\n", | |
| " dp[i][j]=min(dp[i-1][j-1]+1, dp[i-1][j]+1, dp[i][j-1]+1)\n", | |
| " return dp[-1][-1]" | |
| ] | |
| }, | |
| { | |
| "attachments": {}, | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Explain the softmax function using an example." | |
| ] | |
| }, | |
| { | |
| "attachments": {}, | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "For example, a logit vector is [4,5,6,7,8]. Then, after applying softmax, it will become\n", | |
| "$$[\\frac{e^4}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^5}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^6}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^7}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^8}{e^4+e^5+e^6+e^7+e^8}]$$" | |
| ] | |
| }, | |
| { | |
| "attachments": {}, | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "What is the relationship between XOR bitwise operator and hamming distance? Explain one example of this distance in the context of machine learning." | |
| ] | |
| }, | |
| { | |
| "attachments": {}, | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We can calculate hamming distance between two strings that have the same length by XOR operation and count the number of 1 that is the hamming distance. We can use hamming distance to measure the similarity between feature vectors, especially when the feature is binary. For example, feature1 = [0, 1, 1, 0], feature2 = [1, 0, 1, 0]. Then, the hamming distance is 2." | |
| ] | |
| }, | |
| { | |
| "attachments": {}, | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "How could you use Naive Bayes theorem for sentiment analysis? Explain with an example." | |
| ] | |
| }, | |
| { | |
| "attachments": {}, | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "For example, we have the below data:\n", | |
| "{\n", | |
| " \"positive\": [[\"it\", \"is\", \"a\", \"wonderful\", \"movie\"], [\"awesome\", \"scene\", \"good\", \"acting\"], [\"liked\", \"this\", \"movie\", \"so\", \"much\"]],\n", | |
| " \"negative\": [[\"bad\", \"movie\", \"I\", \"have\", \"ever\", \"seen\", \"before\"], [\"will\", \"not\", \"watch\", \"it\", \"again\"], [\"awful\", \"and\", \"boring\", \"story\"]],\n", | |
| "}\n", | |
| "There are two sentiments, i.e., positive and negative. We can train a naive bayes classifier on this data." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "{'positive': -10.39720770839918, 'negative': -11.56148703158466}\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "data = {\n", | |
| " \"positive\": [[\"it\", \"is\", \"a\", \"wonderful\", \"movie\"], [\"awesome\", \"scene\", \"good\", \"acting\"], [\"liked\", \"this\", \"movie\", \"so\", \"much\"]],\n", | |
| " \"negative\": [[\"bad\", \"movie\", \"I\", \"have\", \"ever\", \"seen\", \"before\"], [\"will\", \"not\", \"watch\", \"it\", \"again\"], [\"awful\", \"and\", \"boring\", \"story\"]],\n", | |
| "}\n", | |
| "from collections import Counter, defaultdict\n", | |
| "import math\n", | |
| "def train():\n", | |
| " priors={}\n", | |
| " likelihood={}\n", | |
| "\n", | |
| " words=[]\n", | |
| " total_data = 0\n", | |
| " for k in data.keys():\n", | |
| " priors[k]=len(data[k])\n", | |
| " total_data+=len(data[k])\n", | |
| " for sentence in data[k]:\n", | |
| " words+=sentence\n", | |
| " for k in priors.keys():\n", | |
| " priors[k] = priors[k]/total_data\n", | |
| "\n", | |
| " words=set(words)\n", | |
| " for k in data.keys():\n", | |
| " likelihood[k]=defaultdict(lambda : 0.5) # when words are not in training set, then set the probability to 0.5\n", | |
| " total_sentences=[]\n", | |
| " for sentence in data[k]:\n", | |
| " total_sentences+=sentence\n", | |
| " histogram = Counter(total_sentences)\n", | |
| " total_words = sum(histogram.values())\n", | |
| " for word in words:\n", | |
| " likelihood[k][word]=(histogram[word]+1)/(total_words+2) # use laplacian smoothing\n", | |
| " return priors, likelihood\n", | |
| "\n", | |
| "priors, likelihood = train()\n", | |
| "res={'positive':0, 'negative':0}\n", | |
| "test_data = [\"I\", \"like\", \"it\", \"so\", \"much\"]\n", | |
| "for k in res.keys():\n", | |
| " for word in test_data:\n", | |
| " res[k]+=math.log(likelihood[k][word])\n", | |
| " res[k]+=math.log(priors[k])\n", | |
| "print(res)" | |
| ] | |
| }, | |
| { | |
| "attachments": {}, | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "What is the difference between causal inference and causal discovery?" | |
| ] | |
| }, | |
| { | |
| "attachments": {}, | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Causal inference is to answer questions related to cause and effect. Causal discovery is to obtain causal structure from data, that is, deriving a causal model that describes the data." | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "zyhz", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.6.13" | |
| }, | |
| "orig_nbformat": 4 | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment