zoezhou1999/solution.ipynb

## solution.ipynb
{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given an m x n 2D binary grid which represents a map of ‘1’s (land) and ‘0’s (water),\n",
    "return the number of islands. An island is surrounded by water and is formed by connecting\n",
    "adjacent lands horizontally or vertically. You may assume all four edges of the grid are all \n",
    "surrounded by water."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import deque\n",
    "def countIslands(array):\n",
    "    def bfs(i, j): # use bfs to check islands.\n",
    "        dirs = [[-1, 0], [1, 0], [0, -1], [0, 1]]\n",
    "        q = deque([(i, j)])\n",
    "        while q:\n",
    "            x, y = q.popleft()\n",
    "            for dx, dy in dirs:\n",
    "                xx = x + dx\n",
    "                yy = y + dy\n",
    "                if 0 <= xx < len(array) and 0 <= yy < len(array[0]):\n",
    "                    if array[xx][yy] == \"1\":\n",
    "                        array[xx][yy] = \"0\"\n",
    "                        q.append((xx, yy))\n",
    "        return\n",
    "\n",
    "    n = len(array)\n",
    "    m = len(array[0])\n",
    "    cnt = 0\n",
    "    for i in range(n):\n",
    "        for j in range(m):\n",
    "            if array[i][j] == \"1\": # when it is \"1\", it is a island\n",
    "                array[i][j]=\"0\" # set it to \"0\" to mark as visited\n",
    "                bfs(i, j)\n",
    "                cnt += 1\n",
    "    return cnt\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def editDistance(string1, string2): # dynamic programming\n",
    "    n=len(string1)\n",
    "    m=len(string2)\n",
    "    dp=[[float('inf')]*(m+1) for _ in range(n+1)] # dp[i][j] means the min operations that can make string1[:i] to string2[:j]\n",
    "    for i in range(m+1):\n",
    "        dp[0][i]=i\n",
    "    for i in range(n+1):\n",
    "        dp[i][0]=i\n",
    "    for i in range(1,n+1):\n",
    "        for j in range(1,m+1):\n",
    "            if string1[i-1]==string2[j-1]: # if chars at i-1 of string1 and j-1 of string2 are equal\n",
    "                dp[i][j]=min(dp[i-1][j-1], dp[i-1][j]+1, dp[i][j-1]+1)\n",
    "            else: # not equal\n",
    "                dp[i][j]=min(dp[i-1][j-1]+1, dp[i-1][j]+1, dp[i][j-1]+1)\n",
    "    return dp[-1][-1]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explain the softmax function using an example."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, a logit vector is [4,5,6,7,8]. Then, after applying softmax, it will become\n",
    "$$[\\frac{e^4}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^5}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^6}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^7}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^8}{e^4+e^5+e^6+e^7+e^8}]$$"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the relationship between XOR bitwise operator and hamming distance? Explain one example of this distance in the context of machine learning."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can calculate hamming distance between two strings that have the same length by XOR operation and count the number of 1 that is the hamming distance. We can use hamming distance to measure the similarity between feature vectors, especially when the feature is binary. For example, feature1 = [0, 1, 1, 0], feature2 = [1, 0, 1, 0]. Then, the hamming distance is 2."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How could you use Naive Bayes theorem for sentiment analysis? Explain with an example."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, we have the below data:\n",
    "{\n",
    "    \"positive\": [[\"it\", \"is\", \"a\", \"wonderful\", \"movie\"], [\"awesome\", \"scene\", \"good\", \"acting\"], [\"liked\", \"this\", \"movie\", \"so\", \"much\"]],\n",
    "    \"negative\": [[\"bad\", \"movie\", \"I\", \"have\", \"ever\", \"seen\", \"before\"], [\"will\", \"not\", \"watch\", \"it\", \"again\"], [\"awful\", \"and\", \"boring\", \"story\"]],\n",
    "}\n",
    "There are two sentiments, i.e., positive and negative. We can train a naive bayes classifier on this data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'positive': -10.39720770839918, 'negative': -11.56148703158466}\n"
     ]
    }
   ],
   "source": [
    "data = {\n",
    "    \"positive\": [[\"it\", \"is\", \"a\", \"wonderful\", \"movie\"], [\"awesome\", \"scene\", \"good\", \"acting\"], [\"liked\", \"this\", \"movie\", \"so\", \"much\"]],\n",
    "    \"negative\": [[\"bad\", \"movie\", \"I\", \"have\", \"ever\", \"seen\", \"before\"], [\"will\", \"not\", \"watch\", \"it\", \"again\"], [\"awful\", \"and\", \"boring\", \"story\"]],\n",
    "}\n",
    "from collections import Counter, defaultdict\n",
    "import math\n",
    "def train():\n",
    "    priors={}\n",
    "    likelihood={}\n",
    "\n",
    "    words=[]\n",
    "    total_data = 0\n",
    "    for k in data.keys():\n",
    "        priors[k]=len(data[k])\n",
    "        total_data+=len(data[k])\n",
    "        for sentence in data[k]:\n",
    "            words+=sentence\n",
    "    for k in priors.keys():\n",
    "        priors[k] = priors[k]/total_data\n",
    "\n",
    "    words=set(words)\n",
    "    for k in data.keys():\n",
    "        likelihood[k]=defaultdict(lambda : 0.5) # when words are not in training set, then set the probability to 0.5\n",
    "        total_sentences=[]\n",
    "        for sentence in data[k]:\n",
    "            total_sentences+=sentence\n",
    "        histogram = Counter(total_sentences)\n",
    "        total_words = sum(histogram.values())\n",
    "        for word in words:\n",
    "            likelihood[k][word]=(histogram[word]+1)/(total_words+2) # use laplacian smoothing\n",
    "    return priors, likelihood\n",
    "\n",
    "priors, likelihood = train()\n",
    "res={'positive':0, 'negative':0}\n",
    "test_data = [\"I\", \"like\", \"it\", \"so\", \"much\"]\n",
    "for k in res.keys():\n",
    "    for word in test_data:\n",
    "        res[k]+=math.log(likelihood[k][word])\n",
    "    res[k]+=math.log(priors[k])\n",
    "print(res)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the difference between causal inference and causal discovery?"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Causal inference is to answer questions related to cause and effect. Causal discovery is to obtain causal structure from data, that is, deriving a causal model that describes the data."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "zyhz",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Given an m x n 2D binary grid which represents a map of ‘1’s (land) and ‘0’s (water),\n",
	"return the number of islands. An island is surrounded by water and is formed by connecting\n",
	"adjacent lands horizontally or vertically. You may assume all four edges of the grid are all \n",
	"surrounded by water."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"from collections import deque\n",
	"def countIslands(array):\n",
	" def bfs(i, j): # use bfs to check islands.\n",
	" dirs = [[-1, 0], [1, 0], [0, -1], [0, 1]]\n",
	" q = deque([(i, j)])\n",
	" while q:\n",
	" x, y = q.popleft()\n",
	" for dx, dy in dirs:\n",
	" xx = x + dx\n",
	" yy = y + dy\n",
	" if 0 <= xx < len(array) and 0 <= yy < len(array[0]):\n",
	" if array[xx][yy] == \"1\":\n",
	" array[xx][yy] = \"0\"\n",
	" q.append((xx, yy))\n",
	" return\n",
	"\n",
	" n = len(array)\n",
	" m = len(array[0])\n",
	" cnt = 0\n",
	" for i in range(n):\n",
	" for j in range(m):\n",
	" if array[i][j] == \"1\": # when it is \"1\", it is a island\n",
	" array[i][j]=\"0\" # set it to \"0\" to mark as visited\n",
	" bfs(i, j)\n",
	" cnt += 1\n",
	" return cnt\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def editDistance(string1, string2): # dynamic programming\n",
	" n=len(string1)\n",
	" m=len(string2)\n",
	" dp=[[float('inf')]*(m+1) for _ in range(n+1)] # dp[i][j] means the min operations that can make string1[:i] to string2[:j]\n",
	" for i in range(m+1):\n",
	" dp[0][i]=i\n",
	" for i in range(n+1):\n",
	" dp[i][0]=i\n",
	" for i in range(1,n+1):\n",
	" for j in range(1,m+1):\n",
	" if string1[i-1]==string2[j-1]: # if chars at i-1 of string1 and j-1 of string2 are equal\n",
	" dp[i][j]=min(dp[i-1][j-1], dp[i-1][j]+1, dp[i][j-1]+1)\n",
	" else: # not equal\n",
	" dp[i][j]=min(dp[i-1][j-1]+1, dp[i-1][j]+1, dp[i][j-1]+1)\n",
	" return dp[-1][-1]"
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Explain the softmax function using an example."
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"For example, a logit vector is [4,5,6,7,8]. Then, after applying softmax, it will become\n",
	"$$[\\frac{e^4}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^5}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^6}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^7}{e^4+e^5+e^6+e^7+e^8}, \\frac{e^8}{e^4+e^5+e^6+e^7+e^8}]$$"
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"What is the relationship between XOR bitwise operator and hamming distance? Explain one example of this distance in the context of machine learning."
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can calculate hamming distance between two strings that have the same length by XOR operation and count the number of 1 that is the hamming distance. We can use hamming distance to measure the similarity between feature vectors, especially when the feature is binary. For example, feature1 = [0, 1, 1, 0], feature2 = [1, 0, 1, 0]. Then, the hamming distance is 2."
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"How could you use Naive Bayes theorem for sentiment analysis? Explain with an example."
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"For example, we have the below data:\n",
	"{\n",
	" \"positive\": [[\"it\", \"is\", \"a\", \"wonderful\", \"movie\"], [\"awesome\", \"scene\", \"good\", \"acting\"], [\"liked\", \"this\", \"movie\", \"so\", \"much\"]],\n",
	" \"negative\": [[\"bad\", \"movie\", \"I\", \"have\", \"ever\", \"seen\", \"before\"], [\"will\", \"not\", \"watch\", \"it\", \"again\"], [\"awful\", \"and\", \"boring\", \"story\"]],\n",
	"}\n",
	"There are two sentiments, i.e., positive and negative. We can train a naive bayes classifier on this data."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"{'positive': -10.39720770839918, 'negative': -11.56148703158466}\n"
	]
	}
	],
	"source": [
	"data = {\n",
	" \"positive\": [[\"it\", \"is\", \"a\", \"wonderful\", \"movie\"], [\"awesome\", \"scene\", \"good\", \"acting\"], [\"liked\", \"this\", \"movie\", \"so\", \"much\"]],\n",
	" \"negative\": [[\"bad\", \"movie\", \"I\", \"have\", \"ever\", \"seen\", \"before\"], [\"will\", \"not\", \"watch\", \"it\", \"again\"], [\"awful\", \"and\", \"boring\", \"story\"]],\n",
	"}\n",
	"from collections import Counter, defaultdict\n",
	"import math\n",
	"def train():\n",
	" priors={}\n",
	" likelihood={}\n",
	"\n",
	" words=[]\n",
	" total_data = 0\n",
	" for k in data.keys():\n",
	" priors[k]=len(data[k])\n",
	" total_data+=len(data[k])\n",
	" for sentence in data[k]:\n",
	" words+=sentence\n",
	" for k in priors.keys():\n",
	" priors[k] = priors[k]/total_data\n",
	"\n",
	" words=set(words)\n",
	" for k in data.keys():\n",
	" likelihood[k]=defaultdict(lambda : 0.5) # when words are not in training set, then set the probability to 0.5\n",
	" total_sentences=[]\n",
	" for sentence in data[k]:\n",
	" total_sentences+=sentence\n",
	" histogram = Counter(total_sentences)\n",
	" total_words = sum(histogram.values())\n",
	" for word in words:\n",
	" likelihood[k][word]=(histogram[word]+1)/(total_words+2) # use laplacian smoothing\n",
	" return priors, likelihood\n",
	"\n",
	"priors, likelihood = train()\n",
	"res={'positive':0, 'negative':0}\n",
	"test_data = [\"I\", \"like\", \"it\", \"so\", \"much\"]\n",
	"for k in res.keys():\n",
	" for word in test_data:\n",
	" res[k]+=math.log(likelihood[k][word])\n",
	" res[k]+=math.log(priors[k])\n",
	"print(res)"
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"What is the difference between causal inference and causal discovery?"
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Causal inference is to answer questions related to cause and effect. Causal discovery is to obtain causal structure from data, that is, deriving a causal model that describes the data."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "zyhz",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.13"
	},
	"orig_nbformat": 4
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}
No results found