Skip to content

Instantly share code, notes, and snippets.

@anantmalhotra
Created September 5, 2014 08:47
Show Gist options
  • Select an option

  • Save anantmalhotra/70796323b2d1c472db8a to your computer and use it in GitHub Desktop.

Select an option

Save anantmalhotra/70796323b2d1c472db8a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:caf7101f7ced6042ee96e0d544760cfdadd3927755cc228a1657f7521ddeb51c"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Shapley Value"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Shapley value is a solution concept in cooperative game theory. To each cooperative game it assigns a unique distribution (among the players) of a total surplus generated by the coalition of all players.\n",
"\n",
"The setup is as follows: a coalition of players cooperates, and obtains a certain overall gain from that cooperation. Since some players may contribute more to the coalition than others or may possess different bargaining power (for example threatening to destroy the whole surplus), what final distribution of generated surplus among the players should arise in any particular game? Or phrased differently: how important is each player to the overall cooperation, and what payoff can he or she reasonably expect? The Shapley value provides one possible answer to this question."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Cost Allocation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is a sample computation for Shapley values used in cost attribution to customers on every link in the network. For a given link, the fair cost allocation from peak usage profiles are computed from the Shapley Value of usage (Bytes) during the entire period (typically 1 month) sampled at a fixed interval span (typically 5 mins).\n",
"\n",
"In this case, we use one hour bins owing to granularity constraints in data collection and processing. However, given that traffic volumes are large enough and so are the number of users, using different bin sizes would have minor effects on the 95th-percentile [1]. Also, the entire observation period used is 1 day instead of 1 month for demonstration purposes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are \"N\" users which contribute to the total cost of a link \"L\". The 95th percentile of the aggregate user demand is the peak-hour traffic on \"l\". Computing the Shapley value for this coalition game involves getting the 95th percentile usage of \"l\" in various coalition scenarios. This means getting the total usage of \"L\" sampled over 24 hours for all possible coalitions and then computing the marginal contribution of user i to the 95th percentile mechanism.\n",
"\n",
"For example, assuming there are 3 users {1,2,3} using \"L\", the possible coalitions are listed as {1},{2},{3},{1,2},{2,3},{1,3},{1,2,3}. Essentially, these coalitions constitute the power set of the list of users. Therefore for a coalition \"C\" in each hour \"h\" the usage is:\n",
"\n",
"$$U_C(h) = \\sum_{i\\in C}Z_i(h)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the usage we get the 95th percentile cost in the coalition \"C\" as:\n",
"\n",
"$$v(C) = P95_{h\\in (1,24)}U_C(h)$$"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"import numpy as np\n",
"import imp"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Generate some random synthetic data:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"values = {}\n",
"player_list = range(1000)\n",
"for player in player_list:\n",
" values[player] = np.random.lognormal(mean=5,sigma=1,size=24)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"According to the Shapley value, the amount that player i gets given a coalitional game (v, N) is\n",
"\n",
"$$\\phi_i(v)=\\sum_{S \\subseteq N \\setminus\n",
"\\{i\\}} \\frac{|S|!\\; (n-|S|-1)!}{n!}(v(S\\cup\\{i\\})-v(S))$$\n",
"\n",
"where n is the total number of players and the sum extends over all subsets S of N not containing player i.\n",
"\n",
"Finally, all possible coalitions are input into the Shapley Value computation, here is a sample computation for 8 players:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sub_players = player_list[:9]\n",
"sub_values = {k: values[k] for k in sub_players}"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"shapley = imp.load_source('Shapley','/home/anant/Gamepy/Shappy/Shap.py')\n",
"g = shapley.Coop_Game(sub_players, sub_values)\n",
"%time exact = g.shapley()\n",
"exact"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 6min 1s, sys: 28 ms, total: 6min 1s\n",
"Wall time: 6min 1s\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"{0: 324.74018188132948,\n",
" 1: 1028.5355239432683,\n",
" 2: 351.93637001693162,\n",
" 3: 321.23596310538306,\n",
" 4: 688.90685255843573,\n",
" 5: 587.94814453861557,\n",
" 6: 226.09780892466841,\n",
" 7: 280.21505356763896,\n",
" 8: 330.49372585007762}"
]
}
],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Approximation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While getting the exact Shapley values given the formula above is straightforward for small N, it becomes computationally infeasible for N > 50, as the number of computations increase factorially as N!. However the computational complexity can be significantly reduced by using Monte-Carlo sampling methods.\n",
"\n",
"Instead of calculating the Shapley value as the average marginal cost contribution across all orders of arrival, the Shapley Value can be estimated as the avergae marginal cost contribution over a set $\\Pi_{k}$ of \"k\" randomly sampled arrival orders. In other words, instead of considering N! permutations, randomly sample \"k\" permutations in the estimation procedure [2]. This is an implementation using k=1000 randomly sampled permutations:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"shap_approx = imp.load_source('Shapley','/home/anant/Gamepy/Shappy/Shap_approx.py')\n",
"g = shap_approx.Coop_Game(sub_players, sub_values)\n",
"%time approx = g.shapley()\n",
"approx"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 1.16 s, sys: 0 ns, total: 1.16 s\n",
"Wall time: 1.16 s\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"{0: 313.72415779796262,\n",
" 1: 1020.2348548079415,\n",
" 2: 351.08010156504002,\n",
" 3: 334.82253642761708,\n",
" 4: 691.29097635602136,\n",
" 5: 602.38924223100821,\n",
" 6: 221.74494701102154,\n",
" 7: 289.96179784872203,\n",
" 8: 314.86101034059345}"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As seen above, the time complexity has reduced by orders of magnitude. The parameter \"k\" determines the estimation error in approximation, higher \"k\" implies lower error. Hence, one can control the accuracy of estimation by increasing the number of sample permutation orders. Here are some error metrics computed to check the quality of estimation:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from sklearn.metrics import mean_squared_error\n",
"from scipy.stats import pearsonr\n",
"error = mean_squared_error(approx.values(), exact.values())\n",
"correlation = pearsonr(approx.values(), exact.values())\n",
"print \"mean squared error: %f ;\" %error, \"coefficient of determination: %f\" %pow(correlation[0],2)\n",
"plt.scatter(exact.values(),approx.values())\n",
"plt.xlabel(\"Exact\")\n",
"plt.ylabel(\"Estimate\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"mean squared error: 105.348849 ; coefficient of determination: 0.998261\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"<matplotlib.text.Text at 0x7f8226830610>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEKCAYAAAD5MJl4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFgtJREFUeJzt3Xt0lPWdx/F3DNeAXNVwdUEpxVtdLCCUthmpsujx0tui\nHmvx1traHu2urYJ7jmTbPR7x1LZ2q622q2ILrNZalVYpIAziDW3FS6UUsEULLlhBGiCAJpn94/fE\nhDwJJGRmnknm/TpnzjzzzDPJNz8ln/x+v+f5PSBJkiRJkiRJkiRJkiRJkiQloiTpArKloqIis2LF\niqTLkKSOZgWQarrzsPzXkRsrVqwgk8kk+pg9e3biNRTKw7awLWyLjtEWQEVzv1M7TThIkrLHcJAk\nxRgOWZRKpZIuoWDYFg1siwa2RYNCb4tOMyENZKLxM0lSK5WUlEAzWWDPQZIUYzhIkmIMB0lSjOEg\nSYoxHCRJMYaDJCnGcJAkxRgOkqSYLkkXIEmK27x5My+++CLl5eWMHz++/mK1vDEcJKnAPPHEE5x3\n3gWUlo6ntnYtn/nM6dx33515DQiXz5CkAnPEEUezbds9wKeAanr1msCDD36XadOmZf17uXyGJHUA\ntbW1bN++mYb775SRyUzkjTfeyGsduQyHu4GtwKuN9g0AlgDrgMVAv0bvzQLWA2uBqY32fzT6GuuB\n23JYryQlrrS0lFGjPkJJyU+iPRuBRYwdOzavdeQyHO4BmvaBZhLCYTTwRPQa4Hjg/Oh5GnAHDd2c\nHwOXAx+KHtnvV0lSAVm48H8ZOvSH9Ow5iG7dTuKmm2YxYcKEvNaQ6zmHEcBC4KTo9VrCLem2AoOA\nNDCG0GuoA+ZExy0CKoE3gGXAcdH+Cwh9ra80872cc5DUadTW1rJlyxb69+9PWVlZzr5PS3MO+T5b\nqZwQDETP5dH2EOC5RsdtAoYC70fb9TZH+yWpUystLWXo0OR+3SV5KmsmemRNZWXlB9upVKrg77Qk\nSfmWTqdJp9MHPS6JYaUUsAUYDCwnDCvVzz3cHD0vAmYThpWW0zCsdCFhWMphJUnKgkI5lfVRYEa0\nPQN4uNH+C4BuwEjCxPPzhBCpAk4lFH9xo89IknIkl8NKCwh/5R8B/A24kdAzeIBw9tFGYHp07Jpo\n/xqgBriKhiGnq4B7gZ7AY4RehSQph7xCWpKKWKEMK0mSOgDDQZIUYzhIkmIMB0lSjOEgSYoxHCRJ\nMYaDJCnGcJAkxRgOkqQYw0GSFGM4SJJiDAdJUozhIEmKMRwkSTGGgyQpxnCQJMUYDpKkGMNBkhRj\nOEiSYgwHSVKM4SBJijEcJEkxhoMkKcZwkCTFGA6SpBjDQZIUYzhIkmIMB0lSjOEgSYoxHCRJMYaD\nJCnGcJAkxRgOkqQYw0GSFGM4SJJikgqHWcBrwKvAfKA7MABYAqwDFgP9mhy/HlgLTM1rpZJUhEoS\n+J4jgGXAccA+4H7gMeAE4B3gFuB6oD8wEzieECDjgaHAUmA0UNfk62YymUzuq5ekTqSkpASayYIk\neg5VwPtAGdAlen4LOBeYGx0zF/h0tH0esCD6zEZgAzAhf+VKUvFJIhy2A7cCbxJCYQdhOKkc2Bod\nszV6DTAE2NTo85sIPQhJUo4kEQ7HAt8gDC8NAXoDX2hyTCZ6tMTxI0nKoS4JfM9xwDPAtuj1Q8Ak\nYAswKHoeDLwdvb8ZGN7o88OifTGVlZUfbKdSKVKpVPaqlqROIJ1Ok06nD3pcEhPSJwPzCBPMe4F7\ngeeBfyIExhzCRHQ/9p+QnkDDhPQo4r0HJ6QlqY1ampBOoufwMnAf8HvCGUcvAncBhwMPAJcTJp6n\nR8evifavAWqAq3BYSZJyKomeQ67Yc5CkNiqkU1klSQXOcJAkxRgOkqQYw0GSFGM4SJJiDAdJUozh\nIEmKMRwkSTGGgyQpxnCQJMUYDpKkGMNBkhRjOEiSYgwHSVKM4SBJijEcJEkxhoMkKcZwkDqRTCbD\nnDm3MnjwaIYM+TDf/e4P8A6JOhRJ3ENaUo7ceefP+M537mX37vuBDLNnf4EBA/px2WWXJF2aOhh7\nDlInMm/eI+ze/W1gLHAK1dX/ybx5jyRdljqg1obDJ4BLo+0jgZG5KUdSe/Tt2xv42wevS0rejPZJ\nbVPSimMqgY8CHwZGA0OBB4DJuSvrkGQcW1Wxe/nll5k8+XT27LkEqKOs7Oc8++wyTjzxxKRLU4Eq\nKSmBZrKgNeHwMqGP+ofoGeAV4CPZKi5LDAcJWLduHb/4xXxKSkq4+OKLGDVqVNIlqYC1JxyeByYA\nqwnh0At4FsNBkjq8lsKhNXMOvwTuBPoBXwaeAH6WzeIkSYWlNT0HgKnRA+B3wJLclNMu9hwkqY3a\nM6w0B7i+FfuSZjhIUhu1Z1hpajP7zmpvQZKkwnWgK6S/ClwFHAu82mj/4cDTuSxKkpSsAw0r9QX6\nAzcThpDqj90JbMtxXYfCYSVJaqP2zDnUOwro0ej1m+2sKdsMB0lqo/bMOZwLrAf+CqwANgKPZ7E2\nSVKBaU04/BcwCVhHWFPpU8CqXBYlSUpWa8LhfeCd6NhSYDkwLpdFSZKS1Zr7ObxLOENpJTAPeBvY\nlcuiJEnJas2EdG9gD6HncBHQhxAShXbGkhPSktRG7ZmQ3gXUAj2BhYRgaO9v4X7Ag8CfgDXAqcAA\nwrIc64DF0TH1ZhEmxdfS/EV5UkGora1lz549SZchtVtrwuFKYAvhQrjfN3q0x23AY8BxhNVd1wIz\nCeEwmrC438zo2OOB86PnacAdraxbyqtbbvkePXsezuGH92Py5Kls37496ZKkQ9aaYaUNwETCpHQ2\n9CUs/31Mk/1rgQpgKzAISANjCL2GOsJ6TgCLCDcgeq7J5x1WUmIWLVrE5z73NaqrlwND6Nbtas44\nYxu/+c39SZcmHVB7hpX+QphzyJaRwN+Be4AXgZ8S7hFRTggGoufyaHsIsKnR5zcR7kYnFYwnn3yK\n6uovAkcDXXjvvZk8/fRTSZclHbLWnK00k3Bzn2eB96J9GeDqdnzPU4CvAy8AP6BhCKlehgPPazT7\nXmVl5QfbqVSKVCp1iCVKbTNs2BB69nyMPXvqCH9zPc9RRw1OuiwpJp1Ok06nD3pca4aVfg88SZhz\nqIs+kwHmHmJtgwhBMzJ6/XHC0NExwGmE+Y3BhOspxtAQHDdHz4uA2cQvxHNYSYnZu3cvkydPZd26\nGkLvYRmLFz/CpEmTki5NOqD2rK1Uf3vQbHoSuIJwZlIlUBbt30aYW5hJOFtpJmEiej7hVqVDgaXA\nKOK9B8NBiXrvvfd4/PHHqaqqoqKigqOPPjrpkqSDak843AS8ATwK7Gu0vz2nYpxMuNVoN+B14FLC\n1dcPEP7s2ghMB3ZEx98AXAbUANcQ7kbXlOEgSW3UnnDYSPNj/COb2Zckw0GS2igbS3YXOsNBktqo\npXA40NlKnyJcjPY5mu85PJSVyiRJBedA4fBJQjicg+EgSUWlNcNKxxAuhDvYvqQ5rCRJbdSeK6Qf\nbGbfL9tbkCSpcB1oWOk4wjUG/YDP0nDxWx/2v5e0JKmTOVA4jCbMN/SNnuvtBL6Uy6IkSclqzZzD\nJMJyF4XOOQdJaqP2zDl8ljCU1JVw9tI7wMXZLE6SVFhaEw5TgSrgbMLV0scC38phTZKkhLUmHOrn\nJc4mnLn0D9p/m1BJUgFrzf0cFhLu0rYX+CpwVLQtSeqkWru20kDCCqm1hLu2HU6470IhcUJaktro\nUCakr2u0PYUQDAC7OfS7wEmSOoADhcOFjbZvaPLemTmoRZJUIFozIS1JKjKGgyQp5kAT0rVAdbTd\nE9jT6L2etO5Mp3xyQlqS2uhQbvZTmrNqJEkFzWElSVKM4SBJijEcJEkxhoMkKcZwkCTFGA6SpBjD\nQZIUYzhIkmIMB0lSjOEgSYoxHCRJMYaDJCnGcJAkxRgOkqQYw0GSFGM4SJJikgyHUmA1sDB6PQBY\nAqwDFgP9Gh07C1gPrAWm5rFGSSpKSYbDNcAaoP7enjMJ4TAaeCJ6DXA8cH70PA24A3s8kpRTSf2S\nHQacBfyMhnuXngvMjbbnAp+Ots8DFgDvAxuBDcCEfBUqScUoqXD4PvAtoK7RvnJga7S9NXoNMATY\n1Oi4TcDQXBcoScUsiXA4G3ibMN9Q0sIxGRqGm1p6X5KUI10S+J4fIwwhnQX0APoAPyf0FgYBW4DB\nhAAB2AwMb/T5YdG+mMrKyg+2U6kUqVQqq4VLUkeXTqdJp9MHPa6lv9zzpQL4JnAOcAuwDZhDmIzu\nFz0fD8wnzDMMBZYCo4j3HjKZjB0KSWqLkpISaCYLkug5NFX/G/1m4AHgcsLE8/Ro/5po/xqgBrgK\nh5UkKaeS7jlkkz0HSWqjlnoOXi8gSYoxHCRJMYaDJCnGcJAkxRgO2s/dd9/LwIHDKSvrz/Tpl1Bd\nXZ10SZIS4NlK+sCyZcs455wZVFc/AgynR4+vMH36kcyd+5OkS5OUI4V8nYMKxKJFS6iu/jJwCgB7\n987hscdOT7YoSYkwHIrEu+++y1NPPUWPHj2oqKigW7dusWOOPHIA3bu/zL599XvW0q/fgLzWKakw\nOKxUBDZs2MDEiVN4//0xZDLvMnJkV555Zgm9evXa77gdO3Ywduxktm79MDU1w+nadT4PPzyfM844\nI6HKJeVaS8NKhkMRmDLlXFasqKCu7logQ48eFzJr1knceON/xI6tqqpiwYIF7Ny5k2nTpnHiiSfm\nv2BJeeOcQxHbuPFN6upS0asS9u79JOvWvdTssX369OHKK6/MW22SCpOnshaBSZPG07377YR1C/9B\nWdl9fOIT45MuS1IBc1ipCFRVVXHmmZ/nhRdWUVOzlyOOKAe60aNHD6699itcffXX6ruWkoqMcw5F\n7pVXXmHSpNOorv4YsBaYB2QoK5vB7bffwCWXfDHhCiUlwVVZi9ytt97Bnj3fjF7dTLh30qlUV3+b\n++57KMHKJBUiw6FI7Nmzj0ymP9ALeKvRO2/Rt2/vhKqSVKg8W6lIXHHFhfz2t5dSXX0dMJtwG+46\nevW6h8rKpQlXJ6nQOOfQwdTW1rJy5Up2797NxIkTGThwYKs/++tf/5rZs7/H7t07OeaYwYwbdwoz\nZlzMmDFjclixpELmhHQnsG/fPqZMOYdXXtnCYYcNorT0NVauXMwJJ5yQdGmSOignpDuBu+66i9Wr\nS9m1azVVVYvZseNGZsz4etJlSeqEDIcOZMOGN9izJwWUApDJTOHNN99ItCZJnZPh0IFMmjSOXr0W\nANuBOrp2vYPx48clXZakTshw6EDOP/98Lr98Kl27DqdHj6M47rhV3Hvv7UmXJakTckK6A6qqqqK6\nupry8nKXvZDULp6tJEmK8WwlSVKrGQ6SpBjDQZIUYzhIkmIMB0lSjOEgSYoxHCRJMYZDHu3atYuq\nqqqky5CkgzIc8qCmpoaLLrqC/v2PYuDAwZx11ufZu3dv0mVJUosMhzy49dbbePjhv1BT8zY1NdtZ\nvryOWbMqky5LklpkOORBOr2K6uovAb2B7uzdexUrVqxKuixJalES4TAcWA68BvwRuDraPwBYAqwD\nFgP9Gn1mFrAeWAtMzVulWXLsscPp1m0lENZ+Ki1dyYgRw5ItSpIOIImF9wZFj5cIf0r/Afg0cCnw\nDnALcD3QH5gJHA/MB8YDQ4GlwGigrsnXLdiF97Zt28b48Sneeac/0J2ystd54YUVDB8+POnSJBW5\nQl6V9WHgR9GjAthKCI80MIbQa6gD5kTHLwIqgeeafJ2CDQeA6upqli1bRm1tLaeddhp9+vRJuiRJ\najEcuuS/lP2MAMYCq4ByQjAQPZdH20PYPwg2EXoQHUpZWRlnn3120mVIUqskGQ69gV8B1wA7m7yX\noX6AvnnNvldZWfnBdiqVIpVKtatASeps0uk06XT6oMclNazUFfgN8Djwg2jfWiAFbAEGEyatxxDm\nHQBujp4XAbMJvY3GCnpYSZIKUSHd7KcE+B9gDQ3BAPAoMCPankGYi6jffwHQDRgJfAh4Pi+VSlKR\nSqLn8HHgSeAVGoaHZhF+4T8AHA1sBKYDO6L3bwAuA2oIw1C/a+br2nOQpDYq5LOVssVwkKQ2KqRh\nJUlSgTMcJEkxhoMkKcZwkCTFGA6SpBjDQZIUYzhIkmIMB0lSjOEgSYoxHCRJMYaDJCnGcJAkxRgO\nkqQYw0GSFGM4ZFFrbr1XLGyLBrZFA9uiQaG3heGQRYX+HzufbIsGtkUD26JBobeF4SBJijEcJEkx\nnek2oWmgIukiJKmDWQGkki5CkiRJkiRJKjLDgeXAa8Afgauj/QOAJcA6YDHQr9FnZgHrgbXA1LxV\nmj+lwGpgYfS6WNuiH/Ag8CdgDXAqxdsWswj/Rl4F5gPdKZ62uBvYSvjZ6x3Kz/7R6GusB27LYb3K\nkkHAP0fbvYE/A8cBtwDXRfuvB26Oto8HXgK6AiOADXS+s8P+HZgHPBq9Lta2mAtcFm13AfpSnG0x\nAvgLIRAA7gdmUDxt8QlgLPuHQ1t+9voThJ4HJkTbjwHTclaxcuJh4HRC6pdH+wZFryH8VXB9o+MX\nARPzVl3uDQOWAqfR0HMoxrboS/iF2FQxtsUAwh9N/QkhuRA4g+JqixHsHw5t/dkHE3qg9S4AfpKL\nQg+mI6d0kkYQ/kJYRfgPvzXav5WG/xGGAJsafWYTMDRP9eXD94FvAXWN9hVjW4wE/g7cA7wI/BTo\nRXG2xXbgVuBN4C1gB2FIpRjbol5bf/am+zeTUJsYDm3XG/gVcA2ws8l7mejRkgO915GcDbxNmG9o\n6VqZYmmLLsApwB3R825gZpNjiqUtjgW+QfjjaQjh38oXmhxTLG3RnIP97AXFcGibroRg+DlhWAnC\nXwODou3BhF+aEBJ/eKPPDov2dQYfA84F/gosAKYQ2qQY22JT9Hghev0gISS2UHxtMQ54BtgG1AAP\nAZMozrao15Z/E5ui/cOa7O9sbdLplAD3EYZTGruFhrHDmcQnnLoRhh5ep3NdkV6vgoY5h2JtiyeB\n0dF2JaEdirEtTiacydeT8DPNBb5GcbXFCOIT0m392VcRzngrwQnpDuHjhPH1lwjDKasJ/9EGECZm\nmztV7QbCWQhrgX/JZ7F5VEHD2UrF2hYnE3oOLxP+Wu5L8bbFdTScyjqX0NsulrZYQJhreQ/4G3Ap\nh/az15/KugH4Yc6rliRJkiRJkiRJkiRJkiRJkiR1ZLU0XOeymoYVN7PhZODMLH49SVKeNF1PK5su\nAf47h19fkpQjzYVDX8JVrvVLaCwALo+2f0y4evqPhKU16o0HniZccf8c0Iewmmn9gob/muW6JUk5\nVMP+w0r1v8RPJyw6dwFhPZx6/aPnUsKdBE8irKfzOmGZBAirl5YSbozjcgkqCF2SLkDqYPYQ7uXR\n1FJgOvAj4CON9p8PfInwb20wYcE1gP8D/hBt74qeS+j4C8+pk3DJbik7DiPcNnY3YbE1CKttXktY\n0vxk4LdAD1pe07/DrPWvzs9wkLLj3wirkV5EuCtcF8I8wm6ginAHsDMJAfBnQi9iXPTZwwnDSjuj\nbUlSB9N0zuEmwkT0GsLtQSHcKnN2tH0PIQyWEm4E9MVo/zjgWcKE9DNAGWF+4nmckJYkSZIkSZIk\nSZIkSZIkSZIkSZIkSWrZ/wNeYkADaG+4VwAAAABJRU5ErkJggg==\n",
"text": [
"<matplotlib.figure.Figure at 0x7f8227102990>"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This estimator is unbiased and satisfies the efficency property as discussed below [2]. Lets run this estimator for the entire set (N=1000) of users and time it:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"g = shap_approx.Coop_Game(player_list, values)\n",
"%time approx = g.shapley()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 39min 15s, sys: 512 ms, total: 39min 15s\n",
"Wall time: 39min 14s\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Properties"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Shapley value has the following desirable properties:\n",
"\n",
"* Efficiency: The total gain is distributed:\n",
"\n",
"$$\\sum_{i\\in N}\\phi_i(v) = v(N)$$\n",
"\n",
"\n",
"* Symmetry: If i and j are two actors who are equivalent in the sense that for every subset S of N which contains neither i nor j or $v(S\\cup\\{i\\}) = v(S\\cup\\{j\\})$, Then:\n",
"\n",
"$$\u03c6i(v) = \u03c6j(v)$$\n",
"\n",
"* Linearity: if two coalition games described by gain functions v and w are combined, then the distributed gains should correspond to the gains derived from v and the gains derived from w, thus for every i in N:\n",
"\n",
"$$\\phi_i(v+w) = \\phi_i(v) + \\phi_i(w)$$\n",
"\n",
"* Zero Player (Null player): The Shapley value $\\phi_i(v)$ of a null player i in a game v is zero. A player i is null in v if for all coalitions S: \n",
"\n",
"$$v(S\\cup \\{i\\}) = v(S)$$\n",
"\n",
"In fact, given a player set N, the Shapley value is the only map from the set of all games to payoff vectors that satisfies all four properties 1, 2, 3, and 4 from above."
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Concluding Remarks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The problem of allocating the cost of constructing or maintaining a facility among its various users is of great practical importance. As networks are over-provisioned to accomodate peak loads and remain relatively under-utilized while off-peak, the CAPEX and OPEX costs increase significantly during a congested period. Given this scenario, the individual contribution to the cost is not a linear function of its byte utilization. Shapley value is a game theoretic concept used to tackle this problem of cost attribution which is efficient and fair.\n",
"\n",
"Network links are primary drivers of cost, given the cost of operating links within a network, one can estimate the cost of the network. This link is a shared resource and its cost, based on peak utilization, is allocated amongst its users using Shapley Values. Cost attribution to each user for a given link is based on its marginal contribution to the 95th percentile usage of the link with a given bin size permuted across all possible orders of arrival. This becomes computationally infeasible for no. of users, N > 100. An approximation using Monte-Carlo methods where a random sample of \"k\" orders of arrival is considered, instead of all possible permutations in the coalition.\n",
"\n",
"This method is employed with usage over 1-hour bins and k=1000. Considering N=1000 users for a given link \"l\", the calculation is shown to be feasible with sufficient approximation accuracy. Empirical validation of this accuracy is provided in [1]. It can also be linearly approximated and hourly per byte costs can be estimated with an ordinary least squares fitting procedure (described later). Thus the total cost of user \"i\" can be considered as the aggregation of its marginal cost contribution on every link in the set of links \"L\"; $c_{l}$ being cost of the link under consideration; $v(i)$ being shapley value of byte usage by user \"i\" on the link :\n",
"\n",
"$$ C(i) = \\sum_{l\\in L}\\frac{v(i)}{\\Sigma{v(i)}}*c_{l}$$"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Dimitropoulos, Xenofontas, et al. \"On the 95-percentile billing method.\" Passive and Active Network Measurement. Springer Berlin Heidelberg, 2009. 207-216.\n",
"\n",
"2. Stanojevic, Rade, Nikolaos Laoutaris, and Pablo Rodriguez. \"On economic heavy hitters: shapley value analysis of 95th-percentile pricing.\" Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. ACM, 2010.\n",
"\n",
"3. http://en.wikipedia.org/wiki/Shapley_value"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment