-
-
Save domitry/e087d69315075bebe3b1 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "metadata": { | |
| "language": "ruby", | |
| "name": "", | |
| "signature": "sha256:2fbb64dd42fafb2068217707704845a0bf7dd3341038ac4e28b0e4e7ab82fb48" | |
| }, | |
| "nbformat": 3, | |
| "nbformat_minor": 0, | |
| "worksheets": [ | |
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Finding shape consensus among multiple geo polygons\n", | |
| "\n", | |
| "One of the tasks in the [Building Inspector](http://buildinginspector.nypl.org/) is [fixing building footprints](http://buildinginspector.nypl.org/fix). The user is presented a map with an overlaid shape (red dots). The purpose is to draw the correct shape (or shapes, since the red overlay may cover multiple building footprints).\n", | |
| "\n", | |
| "Multiple people receive the same map and overlay. This notebook describes a process to find the resulting consensus (or mean) shape.\n", | |
| "\n", | |
| "Below is an example showing the map, the original polygon shown to each user (red dots) and the resulting polygons drawn (yellow)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "IRuby.html '<iframe src=\"http://jsfiddle.net/mgiraldo/pdkCb/3/embedded/result/\" width=500 height=400></iframe>'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<iframe src=\"http://jsfiddle.net/mgiraldo/pdkCb/2/embedded/result/\" width=500 height=400></iframe>" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 1, | |
| "text": [ | |
| "\"<iframe src=\\\"http://jsfiddle.net/mgiraldo/pdkCb/2/embedded/result/\\\" width=500 height=400></iframe>\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 1 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "It is hard to see but there are 11 yellow polygons: one rectangle in the lower left part, one for the upper right part (both wrong), and 9 for the complete L-shaped building.\n", | |
| "\n", | |
| "# Requirements\n", | |
| "\n", | |
| "The process to find the geometry that best summarizes what is drawn by users has to take into account:\n", | |
| "\n", | |
| "1. an overlay may span _multiple_ polygons (red dots covering more than one building)\n", | |
| "1. polygons may have any number of vertices greater or equal to three\n", | |
| "1. users will not always draw the polygons the same way (eg: use more or fewer points)\n", | |
| "\n", | |
| "The process described in this notebook makes use of the [DBSCAN clustering algorithm](https://en.wikipedia.org/wiki/DBSCAN) to find an unknown amount of dense regions of points and determine the resulting geometries from there. The _input_ to this process will be a GeoJSON FeatureCollection containing all the polygons drawn by contributors that are associated to a given red overlay. the expected _output_ is a list of geo point arrays with the summary shapes determined by the algorithm.\n", | |
| "\n", | |
| "**All the necessary code is included** and should be executable by any machine that has the required Ruby gems installed. _This code was tested on Ruby 2.1.0._\n", | |
| "\n", | |
| "# Process\n", | |
| "\n", | |
| "First, we need the [RGeo](https://github.com/rgeo/rgeo) package along with its [GeoJSON component](https://github.com/rgeo/rgeo-geojson):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": true, | |
| "input": [ | |
| "require 'rgeo'\n", | |
| "require 'rgeo-geojson'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 2, | |
| "text": [ | |
| "true" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 2 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We will use a [Ruby implementation](https://github.com/matiasinsaurralde/dbscan) of the [DBSCAN clustering algorithm](https://en.wikipedia.org/wiki/DBSCAN)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "require 'dbscan'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 3, | |
| "text": [ | |
| "true" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 3 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "For visualization convenience in this notebook we will also use the awesome [Nyaplot](https://github.com/domitry/nyaplot), a D3-powered visualization library. I had to manually build it according to [the instructions](https://github.com/domitry/nyaplot#installation) since it is not yet in RubyGems.org." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "require 'nyaplot'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 4, | |
| "text": [ | |
| "true" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 4 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Initialize Nyaplot to work in this IRuby Notebook:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "Nyaplot.init_iruby" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<script>\n", | |
| "if(window['d3'] === undefined ||\n", | |
| " window['Nyaplot'] === undefined){\n", | |
| " var path = {\"d3\":\"http://d3js.org/d3.v3.min\"};\n", | |
| "\n", | |
| "\n", | |
| "\n", | |
| " var shim = {\"d3\":{\"exports\":\"d3\"}};\n", | |
| "\n", | |
| " require.config({paths: path, shim:shim});\n", | |
| "\n", | |
| "\n", | |
| "require(['d3'], function(d3){window['d3']=d3;console.log('finished loading d3');\n", | |
| "\n", | |
| "\tvar script = d3.select(\"head\")\n", | |
| "\t .append(\"script\")\n", | |
| "\t .attr(\"src\", \"https://rawgit.com/domitry/Nyaplotjs/master/release/nyaplot.js\")\n", | |
| "\t .attr(\"async\", true);\n", | |
| "\n", | |
| "\tscript[0][0].onload = script[0][0].onreadystatechange = function(){\n", | |
| "\t var event = document.createEvent(\"HTMLEvents\");\n", | |
| "\t event.initEvent(\"load_nyaplot\",false,false);\n", | |
| "\t window.dispatchEvent(event);\n", | |
| "\t console.log('Finished loading Nyaplotjs');\n", | |
| "\t};\n", | |
| "\n", | |
| "\n", | |
| "});\n", | |
| "}\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 5, | |
| "text": [ | |
| "\"<script>\\nif(window['d3'] === undefined ||\\n window['Nyaplot'] === undefined){\\n var path = {\\\"d3\\\":\\\"http://d3js.org/d3.v3.min\\\"};\\n\\n\\n\\n var shim = {\\\"d3\\\":{\\\"exports\\\":\\\"d3\\\"}};\\n\\n require.config({paths: path, shim:shim});\\n\\n\\nrequire(['d3'], function(d3){window['d3']=d3;console.log('finished loading d3');\\n\\n\\tvar script = d3.select(\\\"head\\\")\\n\\t .append(\\\"script\\\")\\n\\t .attr(\\\"src\\\", \\\"https://rawgit.com/domitry/Nyaplotjs/master/release/nyaplot.js\\\")\\n\\t .attr(\\\"async\\\", true);\\n\\n\\tscript[0][0].onload = script[0][0].onreadystatechange = function(){\\n\\t var event = document.createEvent(\\\"HTMLEvents\\\");\\n\\t event.initEvent(\\\"load_nyaplot\\\",false,false);\\n\\t window.dispatchEvent(event);\\n\\t console.log('Finished loading Nyaplotjs');\\n\\t};\\n\\n\\n});\\n}\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 5 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "This is the GeoJSON that describes the shapes that have been drawn by the different contributors:\n", | |
| "\n", | |
| "_Note: this GeoJSON will not validate in [GeoJSONLint](http://geojsonlint.com/) because first and last points do not match_" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "geomstr = '{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"user_id\":638},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98620970547199,40.7356342514617],[-73.98627072572708,40.735547874977094],[-73.98632504045963,40.73557226364293],[-73.98622445762157,40.73570995781772],[-73.9861835539341,40.73569268254945],[-73.98621775209902,40.735640856717666]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":666},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98620769381522,40.73563526765495],[-73.9862660318613,40.735547874977094],[-73.98632504045963,40.735570739351566],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569217445325],[-73.98621775209902,40.73563933242788]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"79e7ee062a9e0333926e3e1fdc3e92db\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632369935513,40.735570739351566],[-73.98622512817383,40.73570944972167],[-73.98618154227734,40.73569014206842],[-73.98621909320354,40.735640856717666],[-73.98620970547199,40.73563526765495],[-73.98627005517483,40.73554889117169]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"3d3003b26bb6b2f3b9577924b9ed5e0e\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98621842265129,40.7356423810074],[-73.98620903491974,40.73563577575159],[-73.98627139627934,40.735547874977094],[-73.98632436990738,40.735571755545806],[-73.98622579872608,40.73570995781772],[-73.98618087172508,40.735689633972214]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":596},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98626938462257,40.73554889117167],[-73.98632369935513,40.735572771740024],[-73.98622445762157,40.73570894162559],[-73.98618154227734,40.73569065016463],[-73.98621775209902,40.735640856717666],[-73.98620836436749,40.735634251461676]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"0afaf74383ce51aceba02fc49ce5a9e3\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98621775209902,40.73563984052446],[-73.98620836436749,40.73563272717173],[-73.98626938462257,40.735550415463514],[-73.98632235825062,40.73557124744871],[-73.98622360456956,40.73570641325812],[-73.98618768252459,40.73568957578454]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":538},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632571101189,40.735571755545806],[-73.98622378706932,40.73570995781772],[-73.98618288338184,40.73569268254945],[-73.98621775209902,40.73564034862108],[-73.9862110465765,40.7356362838482],[-73.98627005517483,40.735550923560815]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":580},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632436990738,40.73557124744871],[-73.98626066744328,40.7356581319994],[-73.98625999689102,40.7356581319994],[-73.98620903491974,40.735634759558316],[-73.98626804351805,40.735547874977094]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":580},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98626133799553,40.7356581319994],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569166635704],[-73.98621842265129,40.73563984052446]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":548},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98620970547199,40.73563475955834],[-73.98627005517483,40.73554990736624],[-73.98632369935513,40.735571755545806],[-73.98622360456956,40.73570641325812],[-73.9861848950386,40.735689633972214],[-73.98621842265129,40.735640856717666]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"53056025663f6d6564a39975971cb87c\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98621909320354,40.735638316234656],[-73.98620836436749,40.7356362838482],[-73.98620769381522,40.73563577575159],[-73.98627005517483,40.73554939926897],[-73.98632302880287,40.73557023125444],[-73.98622360456956,40.73570641325812],[-73.98617953062057,40.735689633972214]]]}}]}'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 6, | |
| "text": [ | |
| "\"{\\\"type\\\":\\\"FeatureCollection\\\",\\\"features\\\":[{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":638},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98620970547199,40.7356342514617],[-73.98627072572708,40.735547874977094],[-73.98632504045963,40.73557226364293],[-73.98622445762157,40.73570995781772],[-73.9861835539341,40.73569268254945],[-73.98621775209902,40.735640856717666]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":666},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98620769381522,40.73563526765495],[-73.9862660318613,40.735547874977094],[-73.98632504045963,40.735570739351566],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569217445325],[-73.98621775209902,40.73563933242788]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"79e7ee062a9e0333926e3e1fdc3e92db\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98632369935513,40.735570739351566],[-73.98622512817383,40.73570944972167],[-73.98618154227734,40.73569014206842],[-73.98621909320354,40.735640856717666],[-73.98620970547199,40.73563526765495],[-73.98627005517483,40.73554889117169]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"3d3003b26bb6b2f3b9577924b9ed5e0e\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98621842265129,40.7356423810074],[-73.98620903491974,40.73563577575159],[-73.98627139627934,40.735547874977094],[-73.98632436990738,40.735571755545806],[-73.98622579872608,40.73570995781772],[-73.98618087172508,40.735689633972214]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":596},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98626938462257,40.73554889117167],[-73.98632369935513,40.735572771740024],[-73.98622445762157,40.73570894162559],[-73.98618154227734,40.73569065016463],[-73.98621775209902,40.735640856717666],[-73.98620836436749,40.735634251461676]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"0afaf74383ce51aceba02fc49ce5a9e3\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98621775209902,40.73563984052446],[-73.98620836436749,40.73563272717173],[-73.98626938462257,40.735550415463514],[-73.98632235825062,40.73557124744871],[-73.98622360456956,40.73570641325812],[-73.98618768252459,40.73568957578454]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":538},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98632571101189,40.735571755545806],[-73.98622378706932,40.73570995781772],[-73.98618288338184,40.73569268254945],[-73.98621775209902,40.73564034862108],[-73.9862110465765,40.7356362838482],[-73.98627005517483,40.735550923560815]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":580},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98632436990738,40.73557124744871],[-73.98626066744328,40.7356581319994],[-73.98625999689102,40.7356581319994],[-73.98620903491974,40.735634759558316],[-73.98626804351805,40.735547874977094]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":580},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98626133799553,40.7356581319994],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569166635704],[-73.98621842265129,40.73563984052446]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":548},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98620970547199,40.73563475955834],[-73.98627005517483,40.73554990736624],[-73.98632369935513,40.735571755545806],[-73.98622360456956,40.73570641325812],[-73.9861848950386,40.735689633972214],[-73.98621842265129,40.735640856717666]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"53056025663f6d6564a39975971cb87c\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98621909320354,40.735638316234656],[-73.98620836436749,40.7356362838482],[-73.98620769381522,40.73563577575159],[-73.98627005517483,40.73554939926897],[-73.98632302880287,40.73557023125444],[-73.98622360456956,40.73570641325812],[-73.98617953062057,40.735689633972214]]]}}]}\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 6 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We decode the GeoJSON into a `RGeo::GeoJSON` structure (see the [RGeo::GeoJSON docs](http://rdoc.info/github/rgeo/rgeo-geojson/frames)):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "geocollection = RGeo::GeoJSON.decode(geomstr, :json_parser => :json)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 7, | |
| "text": [ | |
| "#<RGeo::GeoJSON::FeatureCollection:0x8218e4f8>" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 7 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We wrap this in a function for convenience:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def parse(json)\n", | |
| " RGeo::GeoJSON.decode(json, :json_parser => :json)\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 8, | |
| "text": [ | |
| ":parse" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 8 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "This structure is now a group of [features](http://rdoc.info/github/rgeo/rgeo-geojson/RGeo/GeoJSON/Feature), each with an [RGeo::Geos::CAPIPolygonImpl](http://rdoc.info/github/rgeo/rgeo/RGeo/Geos/CAPIPolygonImpl) geometry describing each polygon, among other properties (see the [RGeo docs](http://rdoc.info/github/rgeo/rgeo/frames)):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "geocollection.first.geometry" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 9, | |
| "text": [ | |
| "#<RGeo::Geos::CAPIPolygonImpl:0x82193f48 \"POLYGON ((-73.98620970547199 40.7356342514617, -73.98627072572708 40.735547874977094, -73.98632504045963 40.73557226364293, -73.98622445762157 40.73570995781772, -73.9861835539341 40.73569268254945, -73.98621775209902 40.735640856717666, -73.98620970547199 40.7356342514617))\">" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 9 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Algorithm\n", | |
| "\n", | |
| "The main logic behind this process is as follows:\n", | |
| "\n", | |
| "1. cluster all the polygons by their centroids (similar-shaped polygons should have similar centroids<sup>[1]</sup>, clustering will let us identify outliers)\n", | |
| "1. only use clusters that have three or more centroids (three or more people drew similar-shaped polygons)\n", | |
| "1. for each cluster:\n", | |
| " 1. cluster the vertices of its polygons\n", | |
| " 1. find the mean vertex describing each cluster\n", | |
| " 1. connect those mean vertices in the most likely order\n", | |
| " 1. verify that the connected polygon makes sense (will explain better below)\n", | |
| "\n", | |
| "[1] _different polygons might also have similar centroids but we're skipping this for now :)_\n", | |
| "\n", | |
| "Since DBSCAN works with number arrays, we need to convert the complex RGeo structures. Below a simple centroid-extraction function:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_centroid(poly_feature)\n", | |
| " return if (poly_feature.geometry.geometry_type.type_name != \"Polygon\")\n", | |
| " c = poly_feature.geometry.centroid\n", | |
| " return [c.x, c.y]\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 10, | |
| "text": [ | |
| ":get_centroid" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 10 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Let's test it with the first polygon in the collection:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "centroid = get_centroid(geocollection.first)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 11, | |
| "text": [ | |
| "[-73.98625268168838, 40.73562601945317]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 11 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we need a convenience function to get all the centroids of the collection. We will make it a hash because we later need to be able to go back to this list to extract its corresponding set of polygons and a hash was the way I found most convenient:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_all_centroids(geom)\n", | |
| " centroids = {}\n", | |
| " geom.each_with_index do |poly,index|\n", | |
| " next if (poly.geometry.geometry_type.type_name != \"Polygon\")\n", | |
| " centroids[index] = get_centroid(poly)\n", | |
| " end\n", | |
| " return centroids\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 12, | |
| "text": [ | |
| ":get_all_centroids" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 12 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Test again:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "centroids = get_all_centroids(geocollection)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 13, | |
| "text": [ | |
| "{0=>[-73.98625268168838, 40.73562601945317], 1=>[-73.98625173238652, 40.735625569382876], 2=>[-73.9862518966646, 40.73562642272427], 3=>[-73.986252242017, 40.735626656082445], 4=>[-73.98625152460835, 40.735626229414], 5=>[-73.98625207318744, 40.73562418649854], 6=>[-73.98625258509149, 40.7356272053874], 7=>[-73.98626592099406, 40.735602617283476], 8=>[-73.9862216645921, 40.73567482334759], 9=>[-73.98625254867669, 40.735624721075084], 10=>[-73.98625077341322, 40.73562552211442]}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 13 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "A simple plot of all the centroids using Nyaplot:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot = Nyaplot::Plot.new\n", | |
| "plot.width(400)\n", | |
| "plot.height(400)\n", | |
| "plot.zoom(true)\n", | |
| "points_x = centroids.map { |p| p[1][0] }\n", | |
| "points_y = centroids.map { |p| p[1][1] }\n", | |
| "df = Nyaplot::DataFrame.new({x:points_x,y:points_y})\n", | |
| "# add some padding\n", | |
| "xmin = points_x.min - 1e-5\n", | |
| "xmax = points_x.max + 1e-5\n", | |
| "ymin = points_y.min - 1e-5\n", | |
| "ymax = points_y.max + 1e-5\n", | |
| "plot.xrange([xmin,xmax])\n", | |
| "plot.yrange([ymin,ymax])\n", | |
| "# end padding\n", | |
| "sc = plot.add_with_df(df, :scatter, :x, :y)\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-e0d8c999-6b40-4a41-a8ef-e3e777e213e7'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\"},\"data\":\"91387de3-8aed-4aa0-9fba-1faedd43e2a8\"}],\"options\":{\"width\":400,\"height\":400,\"zoom\":true,\"xrange\":[-73.98627592099406,-73.98621166459209],\"yrange\":[40.73559261728347,40.73568482334759]}}],\"data\":{\"91387de3-8aed-4aa0-9fba-1faedd43e2a8\":[{\"x\":-73.98625268168838,\"y\":40.73562601945317},{\"x\":-73.98625173238652,\"y\":40.735625569382876},{\"x\":-73.9862518966646,\"y\":40.73562642272427},{\"x\":-73.986252242017,\"y\":40.735626656082445},{\"x\":-73.98625152460835,\"y\":40.735626229414},{\"x\":-73.98625207318744,\"y\":40.73562418649854},{\"x\":-73.98625258509149,\"y\":40.7356272053874},{\"x\":-73.98626592099406,\"y\":40.735602617283476},{\"x\":-73.9862216645921,\"y\":40.73567482334759},{\"x\":-73.98625254867669,\"y\":40.735624721075084},{\"x\":-73.98625077341322,\"y\":40.73562552211442}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-e0d8c999-6b40-4a41-a8ef-e3e777e213e7');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 14, | |
| "text": [ | |
| "\"<div id='vis-e0d8c999-6b40-4a41-a8ef-e3e777e213e7'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\"},\\\"data\\\":\\\"91387de3-8aed-4aa0-9fba-1faedd43e2a8\\\"}],\\\"options\\\":{\\\"width\\\":400,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98627592099406,-73.98621166459209],\\\"yrange\\\":[40.73559261728347,40.73568482334759]}}],\\\"data\\\":{\\\"91387de3-8aed-4aa0-9fba-1faedd43e2a8\\\":[{\\\"x\\\":-73.98625268168838,\\\"y\\\":40.73562601945317},{\\\"x\\\":-73.98625173238652,\\\"y\\\":40.735625569382876},{\\\"x\\\":-73.9862518966646,\\\"y\\\":40.73562642272427},{\\\"x\\\":-73.986252242017,\\\"y\\\":40.735626656082445},{\\\"x\\\":-73.98625152460835,\\\"y\\\":40.735626229414},{\\\"x\\\":-73.98625207318744,\\\"y\\\":40.73562418649854},{\\\"x\\\":-73.98625258509149,\\\"y\\\":40.7356272053874},{\\\"x\\\":-73.98626592099406,\\\"y\\\":40.735602617283476},{\\\"x\\\":-73.9862216645921,\\\"y\\\":40.73567482334759},{\\\"x\\\":-73.98625254867669,\\\"y\\\":40.735624721075084},{\\\"x\\\":-73.98625077341322,\\\"y\\\":40.73562552211442}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-e0d8c999-6b40-4a41-a8ef-e3e777e213e7');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 14 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. Clustering centroids\n", | |
| "\n", | |
| "We can see here how the centroids reflect the three different basic shapes drawn by contributors above: the lone centroids for the upper-right and lower-left rectangles and the group of nine centroids for the L-shaped polygons in the \"center\".\n", | |
| "\n", | |
| "The problem now is finding a good minimum distance between centroids:\n", | |
| "\n", | |
| "- **big** enough to cover nearby centroids but also\n", | |
| "- **small** enough to _not_ group polygons that don't belong with each other\n", | |
| "\n", | |
| "Let's create a table to see just how close/far these centroids are from each other (standard euclidean distance: $\\sqrt{((\\Delta x)^2+(\\Delta y)^2)}$). Notice that, since geographic metric units have a _lot_ of significant digits (numbers to the right of the decimal point), we are dealing with distances smaller than $10^{-6}$: " | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": true, | |
| "input": [ | |
| "dists = []\n", | |
| "done = {}\n", | |
| "centroids.each_with_index do |cc1,i|\n", | |
| " centroids.each_with_index do |cc2,j|\n", | |
| " c1 = cc1[1]\n", | |
| " c2 = cc2[1]\n", | |
| " dists.push({:dist=>Math.hypot(c1[0]-c2[0],c1[1]-c2[1]),:from=>i,:to=>j,:from_centroid=>c1,:to_centroid=>c2}) if (c1 != c2 && !done[[c2,c1]]) \n", | |
| " done[[c1,c2]] = true\n", | |
| " end\n", | |
| "end\n", | |
| "dists = dists.sort_by!{|k| k[:dist]}\n", | |
| "dist_df = Nyaplot::DataFrame.new(dists)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<table><tr><th>dist</th><th>from</th><th>to</th><th>from_centroid</th><th>to_centroid</th></tr><tr><td>4.1680249477628687e-07</td><td>2</td><td>3</td><td>[-73.9862518966646, 40.73562642272427]</td><td>[-73.986252242017, 40.735626656082445]</td></tr><tr><td>4.1927880127312373e-07</td><td>2</td><td>4</td><td>[-73.9862518966646, 40.73562642272427]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>6.476388201422145e-07</td><td>3</td><td>6</td><td>[-73.986252242017, 40.735626656082445]</td><td>[-73.98625258509149, 40.7356272053874]</td></tr><tr><td>6.919630457708901e-07</td><td>1</td><td>4</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>7.154453870992346e-07</td><td>5</td><td>9</td><td>[-73.98625207318744, 40.73562418649854]</td><td>[-73.98625254867669, 40.735624721075084]</td></tr><tr><td>7.736974578659688e-07</td><td>0</td><td>3</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.986252242017, 40.735626656082445]</td></tr><tr><td>8.346982305084655e-07</td><td>3</td><td>4</td><td>[-73.986252242017, 40.735626656082445]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>8.690102573992017e-07</td><td>1</td><td>2</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.9862518966646, 40.73562642272427]</td></tr><tr><td>8.825474076951457e-07</td><td>0</td><td>2</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.9862518966646, 40.73562642272427]</td></tr><tr><td>9.601375433120375e-07</td><td>1</td><td>10</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.98625077341322, 40.73562552211442]</td></tr><tr><td>1.0317784775226883e-06</td><td>4</td><td>10</td><td>[-73.98625152460835, 40.735626229414]</td><td>[-73.98625077341322, 40.73562552211442]</td></tr><tr><td>1.0423498270990819e-06</td><td>2</td><td>6</td><td>[-73.9862518966646, 40.73562642272427]</td><td>[-73.98625258509149, 40.7356272053874]</td></tr><tr><td>1.0505890250970463e-06</td><td>0</td><td>1</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.98625173238652, 40.735625569382876]</td></tr><tr><td>1.1759752372883875e-06</td><td>0</td><td>4</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>1.1772662180684655e-06</td><td>1</td><td>9</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.98625254867669, 40.735624721075084]</td></tr><tr><td>1.1898617396243328e-06</td><td>0</td><td>6</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.98625258509149, 40.7356272053874]</td></tr><tr><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td></tr><tr><td>8.468969718424401e-05</td><td>7</td><td>8</td><td>[-73.98626592099406, 40.735602617283476]</td><td>[-73.9862216645921, 40.73567482334759]</td></tr></table>" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 81, | |
| "text": [ | |
| "#<Nyaplot::DataFrame:0x0000010278d888 @name=\"53ef42c2-a4b6-4d3b-b363-51cbd8018f32\", @rows=[{\"dist\"=>4.1680249477628687e-07, \"from\"=>2, \"to\"=>3, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.986252242017, 40.735626656082445]}, {\"dist\"=>4.1927880127312373e-07, \"from\"=>2, \"to\"=>4, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>6.476388201422145e-07, \"from\"=>3, \"to\"=>6, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>6.919630457708901e-07, \"from\"=>1, \"to\"=>4, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>7.154453870992346e-07, \"from\"=>5, \"to\"=>9, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>7.736974578659688e-07, \"from\"=>0, \"to\"=>3, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.986252242017, 40.735626656082445]}, {\"dist\"=>8.346982305084655e-07, \"from\"=>3, \"to\"=>4, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>8.690102573992017e-07, \"from\"=>1, \"to\"=>2, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.9862518966646, 40.73562642272427]}, {\"dist\"=>8.825474076951457e-07, \"from\"=>0, \"to\"=>2, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.9862518966646, 40.73562642272427]}, {\"dist\"=>9.601375433120375e-07, \"from\"=>1, \"to\"=>10, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.0317784775226883e-06, \"from\"=>4, \"to\"=>10, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.0423498270990819e-06, \"from\"=>2, \"to\"=>6, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.0505890250970463e-06, \"from\"=>0, \"to\"=>1, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625173238652, 40.735625569382876]}, {\"dist\"=>1.1759752372883875e-06, \"from\"=>0, \"to\"=>4, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>1.1772662180684655e-06, \"from\"=>1, \"to\"=>9, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.1898617396243328e-06, \"from\"=>0, \"to\"=>6, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.2002662963773613e-06, \"from\"=>1, \"to\"=>3, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.986252242017, 40.735626656082445]}, {\"dist\"=>1.3051734635243496e-06, \"from\"=>0, \"to\"=>9, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.424259229759705e-06, \"from\"=>1, \"to\"=>5, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>1.4397193384948688e-06, \"from\"=>2, \"to\"=>10, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.441231617061079e-06, \"from\"=>4, \"to\"=>6, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.8222869506551436e-06, \"from\"=>2, \"to\"=>9, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.8231297971933801e-06, \"from\"=>4, \"to\"=>9, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.844889313547645e-06, \"from\"=>1, \"to\"=>6, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.8554461894036756e-06, \"from\"=>3, \"to\"=>10, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.8636745452851083e-06, \"from\"=>5, \"to\"=>10, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.9313197748456895e-06, \"from\"=>0, \"to\"=>5, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>1.947620190077808e-06, \"from\"=>9, \"to\"=>10, \"from_centroid\"=>[-73.98625254867669, 40.735624721075084], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.9591563623025234e-06, \"from\"=>3, \"to\"=>9, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.972019255979438e-06, \"from\"=>0, \"to\"=>10, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>2.115287830795299e-06, \"from\"=>4, \"to\"=>5, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>2.2431820796116284e-06, \"from\"=>2, \"to\"=>5, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>2.4729711093657763e-06, \"from\"=>6, \"to\"=>10, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>2.475348072708112e-06, \"from\"=>3, \"to\"=>5, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>2.4845791864857814e-06, \"from\"=>6, \"to\"=>9, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>3.0619823175285526e-06, \"from\"=>5, \"to\"=>6, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>2.563187052301914e-05, \"from\"=>5, \"to\"=>7, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.5834017791549575e-05, \"from\"=>7, \"to\"=>9, \"from_centroid\"=>[-73.98626592099406, 40.735602617283476], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>2.688755773883865e-05, \"from\"=>0, \"to\"=>7, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.698361448529006e-05, \"from\"=>1, \"to\"=>7, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.7460525955903987e-05, \"from\"=>7, \"to\"=>10, \"from_centroid\"=>[-73.98626592099406, 40.735602617283476], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>2.7629347229741968e-05, \"from\"=>2, \"to\"=>7, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.7654812048911204e-05, \"from\"=>4, \"to\"=>7, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.765824052887831e-05, \"from\"=>3, \"to\"=>7, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.797179207560266e-05, \"from\"=>6, \"to\"=>7, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>5.677629272142381e-05, \"from\"=>6, \"to\"=>8, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.7034997606567785e-05, \"from\"=>4, \"to\"=>8, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.7053171211386354e-05, \"from\"=>3, \"to\"=>8, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.706661497797782e-05, \"from\"=>2, \"to\"=>8, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.725325369972539e-05, \"from\"=>8, \"to\"=>10, \"from_centroid\"=>[-73.9862216645921, 40.73567482334759], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>5.7706371411325726e-05, \"from\"=>1, \"to\"=>8, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.782629481832627e-05, \"from\"=>0, \"to\"=>8, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.88563029015269e-05, \"from\"=>8, \"to\"=>9, \"from_centroid\"=>[-73.9862216645921, 40.73567482334759], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>5.906583744015411e-05, \"from\"=>5, \"to\"=>8, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>8.468969718424401e-05, \"from\"=>7, \"to\"=>8, \"from_centroid\"=>[-73.98626592099406, 40.735602617283476], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}]>" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 81 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "From the table (which is sorted by closest points first) we can see that the top 9 results are under $10^{-7}$ units away from each other (0.000001).\n", | |
| "\n", | |
| "## The DBSCAN algorithm\n", | |
| "\n", | |
| "To understand how clusters are formed, it is useful to understand how the [DBSCAN clustering algorithm](https://en.wikipedia.org/wiki/DBSCAN#Algorithm) works:\n", | |
| "\n", | |
| "> DBSCAN requires two parameters: \u03b5 (eps) and the minimum number of points (min_points) required to form a dense region. It starts with an arbitrary starting point that has not been visited. This point's \u03b5-neighborhood is retrieved, and if it contains sufficiently many points, a cluster is started. Otherwise, the point is labeled as noise. Note that this point might later be found in a sufficiently sized \u03b5-environment of a different point and hence be made part of a cluster.\n", | |
| "\n", | |
| "> If a point is found to be a dense part of a cluster, its \u03b5-neighborhood is also part of that cluster. Hence, all points that are found within the \u03b5-neighborhood are added, as is their own \u03b5-neighborhood when they are also dense. This process continues until the density-connected cluster is completely found. Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster or noise.\n", | |
| "\n", | |
| "By playing around with different sets of polygons I came to a general \u03b5 of $1.8(10^{-6})$ and a `min_points` of 2 for **centroid clusters** (polygon vertex clusters have different input values as we will see below).\n", | |
| "\n", | |
| "This is the resulting centroid-clustering function:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def cluster_centroids(centroids)\n", | |
| " dbscan = DBSCAN( centroids.map{|c| c[1]}, :epsilon => 1.8e-06, :min_points => 2, :distance => :euclidean_distance )\n", | |
| " return dbscan.results\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 73, | |
| "text": [ | |
| ":cluster_centroids" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 73 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Let's test it:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "centroid_clusters = cluster_centroids(centroids)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 74, | |
| "text": [ | |
| "{-1=>[[-73.98626592099406, 40.735602617283476], [-73.9862216645921, 40.73567482334759]], 0=>[[-73.98625268168838, 40.73562601945317], [-73.98625173238652, 40.735625569382876], [-73.9862518966646, 40.73562642272427], [-73.986252242017, 40.735626656082445], [-73.98625152460835, 40.735626229414], [-73.98625258509149, 40.7356272053874], [-73.98625254867669, 40.735624721075084], [-73.98625207318744, 40.73562418649854], [-73.98625077341322, 40.73562552211442]]}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 74 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The function returns a hash with whose `[-1]` key (if any) contains all the points that did not belong to a cluster and `[0..n]` contain the different clusters. In this example there is only one cluster, `centroid_clusters[0]` and the rejected `[-1]` non-cluster.\n", | |
| "\n", | |
| "Let's define a cluster plotting function and plot this (notice the \"disappearance\" of the two outliers that are being ignored by the function):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def plot_clusters(clusters)\n", | |
| " plot = Nyaplot::Plot.new\n", | |
| " plot.width(300)\n", | |
| " plot.height(400)\n", | |
| " plot.zoom(true)\n", | |
| " pts = clusters.map{|c| c[1]}.flatten(1)\n", | |
| " # add some padding\n", | |
| " xmin = pts.map {|p| p[0]}.min - 1e-5\n", | |
| " xmax = pts.map {|p| p[0]}.max + 1e-5\n", | |
| " ymin = pts.map {|p| p[1]}.min - 1e-5\n", | |
| " ymax = pts.map {|p| p[1]}.max + 1e-5\n", | |
| " plot.xrange([xmin,xmax])\n", | |
| " plot.yrange([ymin,ymax])\n", | |
| " plot.rotate_x_label(-60)\n", | |
| " plot.x_label(\"\")\n", | |
| " plot.y_label(\"\")\n", | |
| " # now plot\n", | |
| " clusters.each do |cluster|\n", | |
| " if cluster[0] != -1 # ignore cluster -1 because not enough points\n", | |
| " cluster_x = cluster[1].map { |c| c[0] }\n", | |
| " cluster_y = cluster[1].map { |c| c[1] }\n", | |
| " names = cluster[1].map { |c| cluster[0] }\n", | |
| " df = Nyaplot::DataFrame.new({x:cluster_x,y:cluster_y,cluster:names})\n", | |
| " sc = plot.add_with_df(df, :scatter, :x, :y)\n", | |
| " sc.tooltip_contents([:cluster])\n", | |
| " color = \"#\"+ \"%06x\" % (rand * 0xffffff)\n", | |
| " sc.color(color)\n", | |
| " end\n", | |
| " end\n", | |
| " plot.show\n", | |
| " return plot\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 18, | |
| "text": [ | |
| ":plot_clusters" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 18 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot = plot_clusters(centroid_clusters)\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-3e93ca2b-76c3-4c1e-ac75-7e5166ea7790'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#cf19de\"},\"data\":\"e1eebed5-39fb-4ef3-9746-ce8403d7b6bb\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"xrange\":[-73.98627592099406,-73.98621166459209],\"yrange\":[40.73559261728347,40.73568482334759],\"rotate_x_label\":-60,\"x_label\":\"\",\"y_label\":\"\"}}],\"data\":{\"e1eebed5-39fb-4ef3-9746-ce8403d7b6bb\":[{\"x\":-73.98625268168838,\"y\":40.73562601945317,\"cluster\":0},{\"x\":-73.98625173238652,\"y\":40.735625569382876,\"cluster\":0},{\"x\":-73.9862518966646,\"y\":40.73562642272427,\"cluster\":0},{\"x\":-73.986252242017,\"y\":40.735626656082445,\"cluster\":0},{\"x\":-73.98625152460835,\"y\":40.735626229414,\"cluster\":0},{\"x\":-73.98625258509149,\"y\":40.7356272053874,\"cluster\":0},{\"x\":-73.98625254867669,\"y\":40.735624721075084,\"cluster\":0},{\"x\":-73.98625207318744,\"y\":40.73562418649854,\"cluster\":0},{\"x\":-73.98625077341322,\"y\":40.73562552211442,\"cluster\":0}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-3e93ca2b-76c3-4c1e-ac75-7e5166ea7790');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 75, | |
| "text": [ | |
| "\"<div id='vis-3e93ca2b-76c3-4c1e-ac75-7e5166ea7790'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#cf19de\\\"},\\\"data\\\":\\\"e1eebed5-39fb-4ef3-9746-ce8403d7b6bb\\\"}],\\\"options\\\":{\\\"rotate_x_label\\\":-60,\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98627592099406,-73.98621166459209],\\\"yrange\\\":[40.73559261728347,40.73568482334759]}}],\\\"data\\\":{\\\"e1eebed5-39fb-4ef3-9746-ce8403d7b6bb\\\":[{\\\"x\\\":-73.98625268168838,\\\"y\\\":40.73562601945317,\\\"cluster\\\":0},{\\\"x\\\":-73.98625173238652,\\\"y\\\":40.735625569382876,\\\"cluster\\\":0},{\\\"x\\\":-73.9862518966646,\\\"y\\\":40.73562642272427,\\\"cluster\\\":0},{\\\"x\\\":-73.986252242017,\\\"y\\\":40.735626656082445,\\\"cluster\\\":0},{\\\"x\\\":-73.98625152460835,\\\"y\\\":40.735626229414,\\\"cluster\\\":0},{\\\"x\\\":-73.98625258509149,\\\"y\\\":40.7356272053874,\\\"cluster\\\":0},{\\\"x\\\":-73.98625254867669,\\\"y\\\":40.735624721075084,\\\"cluster\\\":0},{\\\"x\\\":-73.98625207318744,\\\"y\\\":40.73562418649854,\\\"cluster\\\":0},{\\\"x\\\":-73.98625077341322,\\\"y\\\":40.73562552211442,\\\"cluster\\\":0}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-3e93ca2b-76c3-4c1e-ac75-7e5166ea7790');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 75 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 2. Clustering vertices\n", | |
| "\n", | |
| "Now we need to:\n", | |
| "\n", | |
| "1. work backwards from the centroid clusters that have three or more centroids (only one in this case)\n", | |
| "1. find the polygons they belong to and, finally,\n", | |
| "1. find their vertices and cluster them\n", | |
| "\n", | |
| "Below a function that retrieves the polygons for a given centroid cluster based on the structures we have built so far:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "# given a list of centroids (lon,lat), find their poly's index in the centroid list (index => lon,lat)\n", | |
| "def get_polys_for_centroid_cluster(cluster, centroids, original_polys)\n", | |
| " polys = []\n", | |
| " cluster.each do |cl|\n", | |
| " index = centroids.select {|k,v| v == cl}.keys.first\n", | |
| " polys.push(original_polys[index]) if index != -1\n", | |
| " end\n", | |
| " return polys\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 20, | |
| "text": [ | |
| ":get_polys_for_centroid_cluster" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 20 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Applying this to the only cluster that has useful centroids:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "cluster_polygons = get_polys_for_centroid_cluster(centroid_clusters[0], centroids, geocollection)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 21, | |
| "text": [ | |
| "[#<RGeo::GeoJSON::Feature:0x82193ed0 id=nil geom=\"POLYGON ((-73.98620970547199 40.7356342514617, -73.98627072572708 40.735547874977094, -73.98632504045963 40.73557226364293, -73.98622445762157 40.73570995781772, -73.9861835539341 40.73569268254945, -73.98621775209902 40.735640856717666, -73.98620970547199 40.7356342514617))\">, #<RGeo::GeoJSON::Feature:0x82193674 id=nil geom=\"POLYGON ((-73.98620769381522 40.73563526765495, -73.9862660318613 40.735547874977094, -73.98632504045963 40.735570739351566, -73.98622579872608 40.73570944972167, -73.98618154227734 40.73569217445325, -73.98621775209902 40.73563933242788, -73.98620769381522 40.73563526765495))\">, #<RGeo::GeoJSON::Feature:0x82193034 id=nil geom=\"POLYGON ((-73.98632369935513 40.735570739351566, -73.98622512817383 40.73570944972167, -73.98618154227734 40.73569014206842, -73.98621909320354 40.735640856717666, -73.98620970547199 40.73563526765495, -73.98627005517483 40.73554889117169, -73.98632369935513 40.735570739351566))\">, #<RGeo::GeoJSON::Feature:0x82192b20 id=nil geom=\"POLYGON ((-73.98621842265129 40.7356423810074, -73.98620903491974 40.73563577575159, -73.98627139627934 40.735547874977094, -73.98632436990738 40.735571755545806, -73.98622579872608 40.73570995781772, -73.98618087172508 40.735689633972214, -73.98621842265129 40.7356423810074))\">, #<RGeo::GeoJSON::Feature:0x82192620 id=nil geom=\"POLYGON ((-73.98626938462257 40.73554889117167, -73.98632369935513 40.735572771740024, -73.98622445762157 40.73570894162559, -73.98618154227734 40.73569065016463, -73.98621775209902 40.735640856717666, -73.98620836436749 40.735634251461676, -73.98626938462257 40.73554889117167))\">, #<RGeo::GeoJSON::Feature:0x8218fe48 id=nil geom=\"POLYGON ((-73.98632571101189 40.735571755545806, -73.98622378706932 40.73570995781772, -73.98618288338184 40.73569268254945, -73.98621775209902 40.73564034862108, -73.9862110465765 40.7356362838482, -73.98627005517483 40.735550923560815, -73.98632571101189 40.735571755545806))\">, #<RGeo::GeoJSON::Feature:0x8218ef5c id=nil geom=\"POLYGON ((-73.98620970547199 40.73563475955834, -73.98627005517483 40.73554990736624, -73.98632369935513 40.735571755545806, -73.98622360456956 40.73570641325812, -73.9861848950386 40.735689633972214, -73.98621842265129 40.735640856717666, -73.98620970547199 40.73563475955834))\">, #<RGeo::GeoJSON::Feature:0x82192120 id=nil geom=\"POLYGON ((-73.98621775209902 40.73563984052446, -73.98620836436749 40.73563272717173, -73.98626938462257 40.735550415463514, -73.98632235825062 40.73557124744871, -73.98622360456956 40.73570641325812, -73.98618768252459 40.73568957578454, -73.98621775209902 40.73563984052446))\">, #<RGeo::GeoJSON::Feature:0x8218e55c id=nil geom=\"POLYGON ((-73.98621909320354 40.735638316234656, -73.98620836436749 40.7356362838482, -73.98620769381522 40.73563577575159, -73.98627005517483 40.73554939926897, -73.98632302880287 40.73557023125444, -73.98622360456956 40.73570641325812, -73.98617953062057 40.735689633972214, -73.98621909320354 40.735638316234656))\">]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 21 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We need a method to extract the vertices from each polygon (in a DBSCAN-compatible format):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_points(poly_feature)\n", | |
| " geom = poly_feature.geometry\n", | |
| " return false if (geom.geometry_type.type_name != \"Polygon\")\n", | |
| " pts = []\n", | |
| " points = geom.exterior_ring.points\n", | |
| " points.each do |point|\n", | |
| " pts.push([point.x,point.y])\n", | |
| " end\n", | |
| " return pts\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 22, | |
| "text": [ | |
| ":get_points" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 22 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now let's plot what we have so far (vertices from the same polygon are the same color):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def plot_polys(polys)\n", | |
| " plot = Nyaplot::Plot.new\n", | |
| " plot.width(500)\n", | |
| " plot.height(500)\n", | |
| " plot.zoom(true)\n", | |
| " polys.each do |poly|\n", | |
| " plot_poly(poly, plot)\n", | |
| " end\n", | |
| " plot.show\n", | |
| "end\n", | |
| "def plot_poly(poly, plot = nil)\n", | |
| " showplot = false\n", | |
| " if plot == nil\n", | |
| " showplot = true\n", | |
| " plot = Nyaplot::Plot.new\n", | |
| " plot.width(500)\n", | |
| " plot.height(500)\n", | |
| " plot.zoom(true)\n", | |
| " end\n", | |
| " points = get_points(poly)\n", | |
| " points_x = points.map { |p| p[0] }\n", | |
| " points_y = points.map { |p| p[1] }\n", | |
| " df = Nyaplot::DataFrame.new({x:points_x,y:points_y})\n", | |
| " sc = plot.add_with_df(df, :scatter, :x, :y)\n", | |
| " color = \"#\"+ \"%06x\" % (rand * 0xffffff)\n", | |
| " sc.color(color)\n", | |
| " plot.show if showplot\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 24, | |
| "text": [ | |
| ":plot_poly" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 24 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot_polys(cluster_polygons)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-2b3756b1-88e8-4390-826c-231a36eb4fd0'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#00bfec\"},\"data\":\"86e68646-fba8-4acd-8b17-082a080a591c\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#935774\"},\"data\":\"e6f3a8e9-6cf9-4723-bf04-39952070c3ce\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#d52cca\"},\"data\":\"0ea89e6f-33b2-438a-a274-fd80f6a09672\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#f9c8b0\"},\"data\":\"b74703f9-7159-40d3-9b63-edf7d496fae9\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#4b8057\"},\"data\":\"fd253884-5423-4742-a8c8-6c0409ed547c\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#5f4c55\"},\"data\":\"f4cac916-38de-487f-b47b-b3fe908782fb\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#652be2\"},\"data\":\"27e5d698-c8db-4964-90f3-dfa0b8f3186b\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#db0832\"},\"data\":\"c8f1302b-4de3-4499-a9d9-cf9444a45ab7\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#91a3cc\"},\"data\":\"25e44099-fd37-458d-a79e-013189a893d8\"}],\"options\":{\"width\":500,\"height\":500,\"zoom\":true,\"xrange\":[-73.98632571101189,-73.98617953062057],\"yrange\":[40.735547874977094,40.73570995781772]}}],\"data\":{\"86e68646-fba8-4acd-8b17-082a080a591c\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617},{\"x\":-73.98627072572708,\"y\":40.735547874977094},{\"x\":-73.98632504045963,\"y\":40.73557226364293},{\"x\":-73.98622445762157,\"y\":40.73570995781772},{\"x\":-73.9861835539341,\"y\":40.73569268254945},{\"x\":-73.98621775209902,\"y\":40.735640856717666},{\"x\":-73.98620970547199,\"y\":40.7356342514617}],\"e6f3a8e9-6cf9-4723-bf04-39952070c3ce\":[{\"x\":-73.98620769381522,\"y\":40.73563526765495},{\"x\":-73.9862660318613,\"y\":40.735547874977094},{\"x\":-73.98632504045963,\"y\":40.735570739351566},{\"x\":-73.98622579872608,\"y\":40.73570944972167},{\"x\":-73.98618154227734,\"y\":40.73569217445325},{\"x\":-73.98621775209902,\"y\":40.73563933242788},{\"x\":-73.98620769381522,\"y\":40.73563526765495}],\"0ea89e6f-33b2-438a-a274-fd80f6a09672\":[{\"x\":-73.98632369935513,\"y\":40.735570739351566},{\"x\":-73.98622512817383,\"y\":40.73570944972167},{\"x\":-73.98618154227734,\"y\":40.73569014206842},{\"x\":-73.98621909320354,\"y\":40.735640856717666},{\"x\":-73.98620970547199,\"y\":40.73563526765495},{\"x\":-73.98627005517483,\"y\":40.73554889117169},{\"x\":-73.98632369935513,\"y\":40.735570739351566}],\"b74703f9-7159-40d3-9b63-edf7d496fae9\":[{\"x\":-73.98621842265129,\"y\":40.7356423810074},{\"x\":-73.98620903491974,\"y\":40.73563577575159},{\"x\":-73.98627139627934,\"y\":40.735547874977094},{\"x\":-73.98632436990738,\"y\":40.735571755545806},{\"x\":-73.98622579872608,\"y\":40.73570995781772},{\"x\":-73.98618087172508,\"y\":40.735689633972214},{\"x\":-73.98621842265129,\"y\":40.7356423810074}],\"fd253884-5423-4742-a8c8-6c0409ed547c\":[{\"x\":-73.98626938462257,\"y\":40.73554889117167},{\"x\":-73.98632369935513,\"y\":40.735572771740024},{\"x\":-73.98622445762157,\"y\":40.73570894162559},{\"x\":-73.98618154227734,\"y\":40.73569065016463},{\"x\":-73.98621775209902,\"y\":40.735640856717666},{\"x\":-73.98620836436749,\"y\":40.735634251461676},{\"x\":-73.98626938462257,\"y\":40.73554889117167}],\"f4cac916-38de-487f-b47b-b3fe908782fb\":[{\"x\":-73.98632571101189,\"y\":40.735571755545806},{\"x\":-73.98622378706932,\"y\":40.73570995781772},{\"x\":-73.98618288338184,\"y\":40.73569268254945},{\"x\":-73.98621775209902,\"y\":40.73564034862108},{\"x\":-73.9862110465765,\"y\":40.7356362838482},{\"x\":-73.98627005517483,\"y\":40.735550923560815},{\"x\":-73.98632571101189,\"y\":40.735571755545806}],\"27e5d698-c8db-4964-90f3-dfa0b8f3186b\":[{\"x\":-73.98620970547199,\"y\":40.73563475955834},{\"x\":-73.98627005517483,\"y\":40.73554990736624},{\"x\":-73.98632369935513,\"y\":40.735571755545806},{\"x\":-73.98622360456956,\"y\":40.73570641325812},{\"x\":-73.9861848950386,\"y\":40.735689633972214},{\"x\":-73.98621842265129,\"y\":40.735640856717666},{\"x\":-73.98620970547199,\"y\":40.73563475955834}],\"c8f1302b-4de3-4499-a9d9-cf9444a45ab7\":[{\"x\":-73.98621775209902,\"y\":40.73563984052446},{\"x\":-73.98620836436749,\"y\":40.73563272717173},{\"x\":-73.98626938462257,\"y\":40.735550415463514},{\"x\":-73.98632235825062,\"y\":40.73557124744871},{\"x\":-73.98622360456956,\"y\":40.73570641325812},{\"x\":-73.98618768252459,\"y\":40.73568957578454},{\"x\":-73.98621775209902,\"y\":40.73563984052446}],\"25e44099-fd37-458d-a79e-013189a893d8\":[{\"x\":-73.98621909320354,\"y\":40.735638316234656},{\"x\":-73.98620836436749,\"y\":40.7356362838482},{\"x\":-73.98620769381522,\"y\":40.73563577575159},{\"x\":-73.98627005517483,\"y\":40.73554939926897},{\"x\":-73.98632302880287,\"y\":40.73557023125444},{\"x\":-73.98622360456956,\"y\":40.73570641325812},{\"x\":-73.98617953062057,\"y\":40.735689633972214},{\"x\":-73.98621909320354,\"y\":40.735638316234656}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-2b3756b1-88e8-4390-826c-231a36eb4fd0');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 25, | |
| "text": [ | |
| "\"<div id='vis-2b3756b1-88e8-4390-826c-231a36eb4fd0'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#00bfec\\\"},\\\"data\\\":\\\"86e68646-fba8-4acd-8b17-082a080a591c\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#935774\\\"},\\\"data\\\":\\\"e6f3a8e9-6cf9-4723-bf04-39952070c3ce\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#d52cca\\\"},\\\"data\\\":\\\"0ea89e6f-33b2-438a-a274-fd80f6a09672\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#f9c8b0\\\"},\\\"data\\\":\\\"b74703f9-7159-40d3-9b63-edf7d496fae9\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#4b8057\\\"},\\\"data\\\":\\\"fd253884-5423-4742-a8c8-6c0409ed547c\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#5f4c55\\\"},\\\"data\\\":\\\"f4cac916-38de-487f-b47b-b3fe908782fb\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#652be2\\\"},\\\"data\\\":\\\"27e5d698-c8db-4964-90f3-dfa0b8f3186b\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#db0832\\\"},\\\"data\\\":\\\"c8f1302b-4de3-4499-a9d9-cf9444a45ab7\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#91a3cc\\\"},\\\"data\\\":\\\"25e44099-fd37-458d-a79e-013189a893d8\\\"}],\\\"options\\\":{\\\"width\\\":500,\\\"height\\\":500,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98632571101189,-73.98617953062057],\\\"yrange\\\":[40.735547874977094,40.73570995781772]}}],\\\"data\\\":{\\\"86e68646-fba8-4acd-8b17-082a080a591c\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617},{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772},{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617}],\\\"e6f3a8e9-6cf9-4723-bf04-39952070c3ce\\\":[{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495}],\\\"0ea89e6f-33b2-438a-a274-fd80f6a09672\\\":[{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566}],\\\"b74703f9-7159-40d3-9b63-edf7d496fae9\\\":[{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074}],\\\"fd253884-5423-4742-a8c8-6c0409ed547c\\\":[{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167}],\\\"f4cac916-38de-487f-b47b-b3fe908782fb\\\":[{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806}],\\\"27e5d698-c8db-4964-90f3-dfa0b8f3186b\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834}],\\\"c8f1302b-4de3-4499-a9d9-cf9444a45ab7\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446}],\\\"25e44099-fd37-458d-a79e-013189a893d8\\\":[{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-2b3756b1-88e8-4390-826c-231a36eb4fd0');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 25 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Let's cluster these points. Below is a function that extracts the points from a list of polygons:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_all_poly_points(polys)\n", | |
| " points = []\n", | |
| " polys.each do |poly|\n", | |
| " points.push(get_points(poly))\n", | |
| " end\n", | |
| " return points\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 23, | |
| "text": [ | |
| ":get_all_poly_points" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 23 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "cluster_poly_points = get_all_poly_points(cluster_polygons)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 26, | |
| "text": [ | |
| "[[[-73.98620970547199, 40.7356342514617], [-73.98627072572708, 40.735547874977094], [-73.98632504045963, 40.73557226364293], [-73.98622445762157, 40.73570995781772], [-73.9861835539341, 40.73569268254945], [-73.98621775209902, 40.735640856717666], [-73.98620970547199, 40.7356342514617]], [[-73.98620769381522, 40.73563526765495], [-73.9862660318613, 40.735547874977094], [-73.98632504045963, 40.735570739351566], [-73.98622579872608, 40.73570944972167], [-73.98618154227734, 40.73569217445325], [-73.98621775209902, 40.73563933242788], [-73.98620769381522, 40.73563526765495]], [[-73.98632369935513, 40.735570739351566], [-73.98622512817383, 40.73570944972167], [-73.98618154227734, 40.73569014206842], [-73.98621909320354, 40.735640856717666], [-73.98620970547199, 40.73563526765495], [-73.98627005517483, 40.73554889117169], [-73.98632369935513, 40.735570739351566]], [[-73.98621842265129, 40.7356423810074], [-73.98620903491974, 40.73563577575159], [-73.98627139627934, 40.735547874977094], [-73.98632436990738, 40.735571755545806], [-73.98622579872608, 40.73570995781772], [-73.98618087172508, 40.735689633972214], [-73.98621842265129, 40.7356423810074]], [[-73.98626938462257, 40.73554889117167], [-73.98632369935513, 40.735572771740024], [-73.98622445762157, 40.73570894162559], [-73.98618154227734, 40.73569065016463], [-73.98621775209902, 40.735640856717666], [-73.98620836436749, 40.735634251461676], [-73.98626938462257, 40.73554889117167]], [[-73.98632571101189, 40.735571755545806], [-73.98622378706932, 40.73570995781772], [-73.98618288338184, 40.73569268254945], [-73.98621775209902, 40.73564034862108], [-73.9862110465765, 40.7356362838482], [-73.98627005517483, 40.735550923560815], [-73.98632571101189, 40.735571755545806]], [[-73.98620970547199, 40.73563475955834], [-73.98627005517483, 40.73554990736624], [-73.98632369935513, 40.735571755545806], [-73.98622360456956, 40.73570641325812], [-73.9861848950386, 40.735689633972214], [-73.98621842265129, 40.735640856717666], [-73.98620970547199, 40.73563475955834]], [[-73.98621775209902, 40.73563984052446], [-73.98620836436749, 40.73563272717173], [-73.98626938462257, 40.735550415463514], [-73.98632235825062, 40.73557124744871], [-73.98622360456956, 40.73570641325812], [-73.98618768252459, 40.73568957578454], [-73.98621775209902, 40.73563984052446]], [[-73.98621909320354, 40.735638316234656], [-73.98620836436749, 40.7356362838482], [-73.98620769381522, 40.73563577575159], [-73.98627005517483, 40.73554939926897], [-73.98632302880287, 40.73557023125444], [-73.98622360456956, 40.73570641325812], [-73.98617953062057, 40.735689633972214], [-73.98621909320354, 40.735638316234656]]]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 26 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The better \u03b5 value I found for these points is a bit more complicated. If it is too big, the L-shape will be lost: points in that corner will be clustered together. After fiddling around I found a decent value of of $6(10^{-6})$.\n", | |
| "\n", | |
| "An important aspect to account for here is that the GeoJSON spec requires that the coordinate array has to begin _and end_ with the _same point_. Therefore this point would be **counted twice** if we leave the array as-is. Below the resulting clustering function, corresponding test, and plot:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def cluster_points(original_points)\n", | |
| " # exclude first item in each poly since it is same as last\n", | |
| " unique_points = original_points.map{|poly| poly[1..-1]}\n", | |
| " dbscan = DBSCAN( unique_points.flatten(1), :epsilon => 6e-06, :min_points => 2, :distance => :euclidean_distance )\n", | |
| " return dbscan.results\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 27, | |
| "text": [ | |
| ":cluster_points" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 27 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "vertex_clusters = cluster_points(cluster_poly_points)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 28, | |
| "text": [ | |
| "{0=>[[-73.98627072572708, 40.735547874977094], [-73.9862660318613, 40.735547874977094], [-73.98627005517483, 40.73554889117169], [-73.98627139627934, 40.735547874977094], [-73.98626938462257, 40.73554889117167], [-73.98627005517483, 40.735550923560815], [-73.98627005517483, 40.73554990736624], [-73.98626938462257, 40.735550415463514], [-73.98627005517483, 40.73554939926897]], 1=>[[-73.98632504045963, 40.73557226364293], [-73.98632504045963, 40.735570739351566], [-73.98632369935513, 40.735570739351566], [-73.98632436990738, 40.735571755545806], [-73.98632369935513, 40.735572771740024], [-73.98632571101189, 40.735571755545806], [-73.98632369935513, 40.735571755545806], [-73.98632235825062, 40.73557124744871], [-73.98632302880287, 40.73557023125444]], 2=>[[-73.98622445762157, 40.73570995781772], [-73.98622579872608, 40.73570944972167], [-73.98622512817383, 40.73570944972167], [-73.98622579872608, 40.73570995781772], [-73.98622445762157, 40.73570894162559], [-73.98622378706932, 40.73570995781772], [-73.98622360456956, 40.73570641325812], [-73.98622360456956, 40.73570641325812], [-73.98622360456956, 40.73570641325812]], 3=>[[-73.9861835539341, 40.73569268254945], [-73.98618154227734, 40.73569217445325], [-73.98618154227734, 40.73569014206842], [-73.98618087172508, 40.735689633972214], [-73.98618154227734, 40.73569065016463], [-73.98618288338184, 40.73569268254945], [-73.9861848950386, 40.735689633972214], [-73.98618768252459, 40.73568957578454], [-73.98617953062057, 40.735689633972214]], 4=>[[-73.98621775209902, 40.735640856717666], [-73.98621775209902, 40.73563933242788], [-73.98621909320354, 40.735640856717666], [-73.98621842265129, 40.7356423810074], [-73.98621775209902, 40.73564034862108], [-73.98621842265129, 40.735640856717666], [-73.98621775209902, 40.73563984052446], [-73.98621909320354, 40.735638316234656], [-73.98621775209902, 40.735640856717666]], 5=>[[-73.98620970547199, 40.7356342514617], [-73.98620769381522, 40.73563526765495], [-73.98620970547199, 40.73563526765495], [-73.98620903491974, 40.73563577575159], [-73.98620836436749, 40.735634251461676], [-73.9862110465765, 40.7356362838482], [-73.98620970547199, 40.73563475955834], [-73.98620836436749, 40.73563272717173], [-73.98620836436749, 40.7356362838482], [-73.98620769381522, 40.73563577575159]]}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 28 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot = plot_clusters(vertex_clusters)\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-f14ed6de-6510-4068-9d5b-fd791dedef9c'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#bdbc87\"},\"data\":\"d10bbc3f-93c7-4fb2-95eb-898231f92cd6\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#540b62\"},\"data\":\"5484f796-d555-4618-8155-726ebace491a\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#d70091\"},\"data\":\"b86a30c7-2d30-4a13-8267-9504a97278c4\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#032eb6\"},\"data\":\"12f2b5e4-7154-4d64-a1df-f2decfe6d82f\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#1442a1\"},\"data\":\"3795f8cd-5c80-407c-ac10-374d3d5dadb4\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#6c1aa2\"},\"data\":\"b5757dff-56c7-409f-9497-7815910277e3\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"xrange\":[-73.98633571101189,-73.98616953062057],\"yrange\":[40.73553787497709,40.73571995781772]}}],\"data\":{\"d10bbc3f-93c7-4fb2-95eb-898231f92cd6\":[{\"x\":-73.98627072572708,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.9862660318613,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554889117169,\"cluster\":0},{\"x\":-73.98627139627934,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.73554889117167,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.735550923560815,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554990736624,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.735550415463514,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554939926897,\"cluster\":0}],\"5484f796-d555-4618-8155-726ebace491a\":[{\"x\":-73.98632504045963,\"y\":40.73557226364293,\"cluster\":1},{\"x\":-73.98632504045963,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632436990738,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735572771740024,\"cluster\":1},{\"x\":-73.98632571101189,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632235825062,\"y\":40.73557124744871,\"cluster\":1},{\"x\":-73.98632302880287,\"y\":40.73557023125444,\"cluster\":1}],\"b86a30c7-2d30-4a13-8267-9504a97278c4\":[{\"x\":-73.98622445762157,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622512817383,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622445762157,\"y\":40.73570894162559,\"cluster\":2},{\"x\":-73.98622378706932,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2}],\"12f2b5e4-7154-4d64-a1df-f2decfe6d82f\":[{\"x\":-73.9861835539341,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569217445325,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569014206842,\"cluster\":3},{\"x\":-73.98618087172508,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569065016463,\"cluster\":3},{\"x\":-73.98618288338184,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.9861848950386,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618768252459,\"y\":40.73568957578454,\"cluster\":3},{\"x\":-73.98617953062057,\"y\":40.735689633972214,\"cluster\":3}],\"3795f8cd-5c80-407c-ac10-374d3d5dadb4\":[{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563933242788,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.7356423810074,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73564034862108,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563984052446,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735638316234656,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4}],\"b5757dff-56c7-409f-9497-7815910277e3\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620903491974,\"y\":40.73563577575159,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.735634251461676,\"cluster\":5},{\"x\":-73.9862110465765,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563475955834,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.73563272717173,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563577575159,\"cluster\":5}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-f14ed6de-6510-4068-9d5b-fd791dedef9c');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 29, | |
| "text": [ | |
| "\"<div id='vis-f14ed6de-6510-4068-9d5b-fd791dedef9c'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#bdbc87\\\"},\\\"data\\\":\\\"d10bbc3f-93c7-4fb2-95eb-898231f92cd6\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#540b62\\\"},\\\"data\\\":\\\"5484f796-d555-4618-8155-726ebace491a\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#d70091\\\"},\\\"data\\\":\\\"b86a30c7-2d30-4a13-8267-9504a97278c4\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#032eb6\\\"},\\\"data\\\":\\\"12f2b5e4-7154-4d64-a1df-f2decfe6d82f\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#1442a1\\\"},\\\"data\\\":\\\"3795f8cd-5c80-407c-ac10-374d3d5dadb4\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#6c1aa2\\\"},\\\"data\\\":\\\"b5757dff-56c7-409f-9497-7815910277e3\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98633571101189,-73.98616953062057],\\\"yrange\\\":[40.73553787497709,40.73571995781772]}}],\\\"data\\\":{\\\"d10bbc3f-93c7-4fb2-95eb-898231f92cd6\\\":[{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169,\\\"cluster\\\":0},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897,\\\"cluster\\\":0}],\\\"5484f796-d555-4618-8155-726ebace491a\\\":[{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293,\\\"cluster\\\":1},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024,\\\"cluster\\\":1},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871,\\\"cluster\\\":1},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444,\\\"cluster\\\":1}],\\\"b86a30c7-2d30-4a13-8267-9504a97278c4\\\":[{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559,\\\"cluster\\\":2},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2}],\\\"12f2b5e4-7154-4d64-a1df-f2decfe6d82f\\\":[{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842,\\\"cluster\\\":3},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463,\\\"cluster\\\":3},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454,\\\"cluster\\\":3},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3}],\\\"3795f8cd-5c80-407c-ac10-374d3d5dadb4\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4}],\\\"b5757dff-56c7-409f-9497-7815910277e3\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676,\\\"cluster\\\":5},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-f14ed6de-6510-4068-9d5b-fd791dedef9c');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 29 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 3. Finding the mean polygon\n", | |
| "\n", | |
| "Now we iterate through each vertex cluster and:\n", | |
| "\n", | |
| "1. find the mean vertex\n", | |
| "1. connect the mean vertices into a mean polygon\n", | |
| "\n", | |
| "For this we need some extra functions in the `Array` object to find the mean value:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "class Array\n", | |
| " def sum\n", | |
| " inject(0.0) { |result, el| result + el }\n", | |
| " end\n", | |
| "\n", | |
| " def mean \n", | |
| " sum / size\n", | |
| " end\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 30, | |
| "text": [ | |
| ":mean" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 30 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we need a function that receives the vertex clusters and returns the average vertex for each cluster:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_mean_poly(clusters)\n", | |
| " means = {}\n", | |
| " clusters.each do |cluster|\n", | |
| " if cluster[0] != -1 # ignore cluster -1 because not enough points\n", | |
| " lon = cluster[1].map {|c| c[0]}.mean\n", | |
| " lat = cluster[1].map {|c| c[1]}.mean\n", | |
| " means[cluster[0]] = [lon,lat]\n", | |
| " end\n", | |
| " end\n", | |
| " return means\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 31, | |
| "text": [ | |
| ":get_mean_poly" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 31 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We test this function with our vertex clusters and plot both (mean vertices as yellow diamonds):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "mean_poly = get_mean_poly(vertex_clusters)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 32, | |
| "text": [ | |
| "{0=>[-73.9862696826458, 40.73554911699269], 1=>[-73.98632407188416, 40.73557147326963], 2=>[-73.98622447129412, 40.73570855047738], 3=>[-73.98618267156186, 40.73569075660959], 4=>[-73.98621819913387, 40.73564040507624], 5=>[-73.98620896786451, 40.735635064416286]}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 32 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "# plot clusters with overlaid (yellow) mean points\n", | |
| "plot = plot_clusters(vertex_clusters)\n", | |
| "# add means\n", | |
| "m_x = mean_poly.map { |m| m[1][0] }\n", | |
| "m_y = mean_poly.map { |m| m[1][1] }\n", | |
| "sc = plot.add(:scatter, m_x, m_y)\n", | |
| "color = \"#ffff00\"\n", | |
| "sc.color(color)\n", | |
| "sc.shape('diamond')\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-58d32c47-9127-47bb-aade-6e05ae75616c'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#1febf2\"},\"data\":\"aa7e5490-3987-43bc-a36c-bdfef261e865\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#8ec176\"},\"data\":\"b6a991f8-b7f5-46b7-a9d6-d98ff2c277e4\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#da94f9\"},\"data\":\"cfc1d108-6931-4748-b902-29d43a3e44ef\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#10745a\"},\"data\":\"e7c6c5f6-4e7c-4474-b9a1-bf1e37f5ea7f\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#d73c59\"},\"data\":\"543380ec-8293-4c6a-96db-197b5e3edae5\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#af424a\"},\"data\":\"857d48d5-33d7-4199-bba1-4aae2b7f6adb\"},{\"type\":\"scatter\",\"options\":{\"x\":\"data0\",\"y\":\"data1\",\"color\":\"#ffff00\",\"shape\":\"diamond\"},\"data\":\"cb52daa3-be38-4143-9bee-83da9230dfa5\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"xrange\":[-73.98633571101189,-73.98616953062057],\"yrange\":[40.73553787497709,40.73571995781772]}}],\"data\":{\"aa7e5490-3987-43bc-a36c-bdfef261e865\":[{\"x\":-73.98627072572708,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.9862660318613,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554889117169,\"cluster\":0},{\"x\":-73.98627139627934,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.73554889117167,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.735550923560815,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554990736624,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.735550415463514,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554939926897,\"cluster\":0}],\"b6a991f8-b7f5-46b7-a9d6-d98ff2c277e4\":[{\"x\":-73.98632504045963,\"y\":40.73557226364293,\"cluster\":1},{\"x\":-73.98632504045963,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632436990738,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735572771740024,\"cluster\":1},{\"x\":-73.98632571101189,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632235825062,\"y\":40.73557124744871,\"cluster\":1},{\"x\":-73.98632302880287,\"y\":40.73557023125444,\"cluster\":1}],\"cfc1d108-6931-4748-b902-29d43a3e44ef\":[{\"x\":-73.98622445762157,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622512817383,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622445762157,\"y\":40.73570894162559,\"cluster\":2},{\"x\":-73.98622378706932,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2}],\"e7c6c5f6-4e7c-4474-b9a1-bf1e37f5ea7f\":[{\"x\":-73.9861835539341,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569217445325,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569014206842,\"cluster\":3},{\"x\":-73.98618087172508,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569065016463,\"cluster\":3},{\"x\":-73.98618288338184,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.9861848950386,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618768252459,\"y\":40.73568957578454,\"cluster\":3},{\"x\":-73.98617953062057,\"y\":40.735689633972214,\"cluster\":3}],\"543380ec-8293-4c6a-96db-197b5e3edae5\":[{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563933242788,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.7356423810074,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73564034862108,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563984052446,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735638316234656,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4}],\"857d48d5-33d7-4199-bba1-4aae2b7f6adb\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620903491974,\"y\":40.73563577575159,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.735634251461676,\"cluster\":5},{\"x\":-73.9862110465765,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563475955834,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.73563272717173,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563577575159,\"cluster\":5}],\"cb52daa3-be38-4143-9bee-83da9230dfa5\":[{\"data0\":-73.9862696826458,\"data1\":40.73554911699269},{\"data0\":-73.98632407188416,\"data1\":40.73557147326963},{\"data0\":-73.98622447129412,\"data1\":40.73570855047738},{\"data0\":-73.98618267156186,\"data1\":40.73569075660959},{\"data0\":-73.98621819913387,\"data1\":40.73564040507624},{\"data0\":-73.98620896786451,\"data1\":40.735635064416286}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-58d32c47-9127-47bb-aade-6e05ae75616c');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 33, | |
| "text": [ | |
| "\"<div id='vis-58d32c47-9127-47bb-aade-6e05ae75616c'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#1febf2\\\"},\\\"data\\\":\\\"aa7e5490-3987-43bc-a36c-bdfef261e865\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#8ec176\\\"},\\\"data\\\":\\\"b6a991f8-b7f5-46b7-a9d6-d98ff2c277e4\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#da94f9\\\"},\\\"data\\\":\\\"cfc1d108-6931-4748-b902-29d43a3e44ef\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#10745a\\\"},\\\"data\\\":\\\"e7c6c5f6-4e7c-4474-b9a1-bf1e37f5ea7f\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#d73c59\\\"},\\\"data\\\":\\\"543380ec-8293-4c6a-96db-197b5e3edae5\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#af424a\\\"},\\\"data\\\":\\\"857d48d5-33d7-4199-bba1-4aae2b7f6adb\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\",\\\"color\\\":\\\"#ffff00\\\",\\\"shape\\\":\\\"diamond\\\"},\\\"data\\\":\\\"cb52daa3-be38-4143-9bee-83da9230dfa5\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98633571101189,-73.98616953062057],\\\"yrange\\\":[40.73553787497709,40.73571995781772]}}],\\\"data\\\":{\\\"aa7e5490-3987-43bc-a36c-bdfef261e865\\\":[{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169,\\\"cluster\\\":0},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897,\\\"cluster\\\":0}],\\\"b6a991f8-b7f5-46b7-a9d6-d98ff2c277e4\\\":[{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293,\\\"cluster\\\":1},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024,\\\"cluster\\\":1},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871,\\\"cluster\\\":1},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444,\\\"cluster\\\":1}],\\\"cfc1d108-6931-4748-b902-29d43a3e44ef\\\":[{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559,\\\"cluster\\\":2},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2}],\\\"e7c6c5f6-4e7c-4474-b9a1-bf1e37f5ea7f\\\":[{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842,\\\"cluster\\\":3},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463,\\\"cluster\\\":3},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454,\\\"cluster\\\":3},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3}],\\\"543380ec-8293-4c6a-96db-197b5e3edae5\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4}],\\\"857d48d5-33d7-4199-bba1-4aae2b7f6adb\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676,\\\"cluster\\\":5},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5}],\\\"cb52daa3-be38-4143-9bee-83da9230dfa5\\\":[{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269},{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963},{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738},{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959},{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624},{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-58d32c47-9127-47bb-aade-6e05ae75616c');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 33 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 4. Connecting it all\n", | |
| "\n", | |
| "So far we have a set of points that seem to be the most likely vertices of the mean polygon drawn by our contributors. However, there are **many ways in which these points could be connected to each other**.\n", | |
| "\n", | |
| "**DISCLAIMER**:\n", | |
| "\n", | |
| "What follows is a _very_ primitive process that I used to determine the most likely connection between those points. This process is the best I could come up with given my limited math knowledge and time. If you have a better idea of how to do this in Ruby please tweet me at [@mgiraldo](https://twitter.com/mgiraldo).\n", | |
| "\n", | |
| "**/DISCLAIMER**\n", | |
| "\n", | |
| "Before going through with connections we need to validate that we have a reasonable amount of clusters to work with: some vertices may be drawn far away enough for them to not cluster properly and therefore no cluster will be produced. We do this by determining the mean vertices in each polygon ($\\bar{m}$) and comparing it with the cluster count ($\\sum c$). Right now: $\\bar{m}\\leq\\sum c$ , so we should have at least _as many_ clusters as we have average points per polygon.\n", | |
| "\n", | |
| "Not perfect but works most of the time:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def validate_clusters(clusters, original_points)\n", | |
| " unique_points = original_points.map{|poly| poly[1..-1]}\n", | |
| " average = (unique_points.flatten.count.to_f / (unique_points.size * 2).to_f).round\n", | |
| " return clusters.select{|k,v| k!=-1}.size >= average\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 34, | |
| "text": [ | |
| ":validate_clusters" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 34 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "validate_clusters(vertex_clusters, cluster_poly_points)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 35, | |
| "text": [ | |
| "true" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 35 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now that this has been verified we proceed to connect.\n", | |
| "\n", | |
| "The general process to connect mean vertices to each other is:\n", | |
| "\n", | |
| "1. for each mean vertex:\n", | |
| " 1. find the cluster of vertices it represents (from_vertices)\n", | |
| " 1. for each vertex in from_vertices:\n", | |
| " 1. find the vertex it is connected to (to_vertex)\n", | |
| " 1. find the cluster to_vertex belongs to (to_cluster)\n", | |
| " 1. add a \"vote\" for to_cluster\n", | |
| " 1. tally the votes\n", | |
| " 1. the to_cluster with most votes is the connected cluster\n", | |
| "1. connect the clusters\n", | |
| "1. validate that the connection makes sense (eg: is a [directed cycle graph](http://en.wikipedia.org/wiki/Cycle_graph))\n", | |
| "\n", | |
| "Below all the corresponding functions:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def find_connected_point(point, original_points)\n", | |
| " original_points.each do |poly|\n", | |
| " poly.each_with_index do |p,index|\n", | |
| " return poly[index+1] if point === p\n", | |
| " end\n", | |
| " end\n", | |
| " return\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 36, | |
| "text": [ | |
| ":find_connected_point" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 36 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def find_cluster_for_point(point, clusters)\n", | |
| " clusters.each do |cluster|\n", | |
| " cluster[1].each do |p|\n", | |
| " return cluster[0] if point === p && cluster[0] != -1\n", | |
| " end\n", | |
| " end\n", | |
| " return\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 37, | |
| "text": [ | |
| ":find_cluster_for_point" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 37 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def connect_clusters(clusters, original_points)\n", | |
| " connections = {}\n", | |
| " # for each cluster\n", | |
| " clusters.each do |cluster|\n", | |
| " # for each point in cluster\n", | |
| " if cluster[0] != -1 # exclude invalid cluster\n", | |
| " cluster_votes = {} # to weigh connection popularity (diff pts might be connected to diff clusters)\n", | |
| " cluster[1].each do |point|\n", | |
| " # find original point connected to it\n", | |
| " connection = find_connected_point(point, original_points)\n", | |
| " connected_cluster = find_cluster_for_point(connection, clusters)\n", | |
| " # if original point belongs to another cluster\n", | |
| " if connected_cluster != nil && connected_cluster != cluster[0]\n", | |
| " # vote for the cluster\n", | |
| " cluster_votes[connected_cluster] = 0 if cluster_votes[connected_cluster] == nil\n", | |
| " cluster_votes[connected_cluster] += 1\n", | |
| " end\n", | |
| " end\n", | |
| " connections[cluster[0]] = cluster_votes.sort_by{|k, v| v}\n", | |
| " next if connections[cluster[0]].size == 0\n", | |
| " connections[cluster[0]] = connections[cluster[0]].reverse[0][0]\n", | |
| " end\n", | |
| " end\n", | |
| " return connections\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 38, | |
| "text": [ | |
| ":connect_clusters" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 38 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "connections = connect_clusters(vertex_clusters, cluster_poly_points)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 39, | |
| "text": [ | |
| "{0=>1, 1=>2, 2=>3, 3=>4, 4=>5, 5=>0}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 39 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "As can be seen above this is a directed cycle graph and the end result is a clean path from the first vertex to the last one.\n", | |
| "\n", | |
| "The fact that the points are sorted (0 to 1, 1 to 2, 2 to 3, and so on) is somewhat coincidential. Below is a basic function that checks the graph and returns a sorted list of clusters (the order we need to follow to draw the mean polygon):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def sort_connections(connections)\n", | |
| " # does some simple check for non-circularity \n", | |
| " sorted = []\n", | |
| " seen = {}\n", | |
| " as_list = connections.select{|k,v| k}\n", | |
| " done = false\n", | |
| " first = as_list.first[0]\n", | |
| " from = first\n", | |
| " while !done do\n", | |
| " to = connections[from]\n", | |
| " done = true if seen[to] || to == nil\n", | |
| " seen[to] = true\n", | |
| " from = to\n", | |
| " sorted.push(to)\n", | |
| " done = true if seen.size == connections.size\n", | |
| " end\n", | |
| " return nil if seen.size != connections.size\n", | |
| " return sorted\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 40, | |
| "text": [ | |
| ":sort_connections" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 40 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "# testing sort function\n", | |
| "sort_connections(connections)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 41, | |
| "text": [ | |
| "[1, 2, 3, 4, 5, 0]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 41 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we can proceed to build our final mean polygon:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def connect_mean_poly(mean_poly, connections)\n", | |
| " connected = []\n", | |
| " sorted = sort_connections(connections)\n", | |
| " return nil if sorted == nil\n", | |
| " sorted.each do |c|\n", | |
| " connected.push([mean_poly[c][0], mean_poly[c][1]])\n", | |
| " end\n", | |
| " # for GeoJSON, last == first\n", | |
| " first = sorted[0]\n", | |
| " connected.push([mean_poly[first][0], mean_poly[first][1]])\n", | |
| " return connected\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 42, | |
| "text": [ | |
| ":connect_mean_poly" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 42 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "final_polygon = connect_mean_poly(mean_poly, connections)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 43, | |
| "text": [ | |
| "[[-73.98632407188416, 40.73557147326963], [-73.98622447129412, 40.73570855047738], [-73.98618267156186, 40.73569075660959], [-73.98621819913387, 40.73564040507624], [-73.98620896786451, 40.735635064416286], [-73.9862696826458, 40.73554911699269], [-73.98632407188416, 40.73557147326963]]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 43 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Let's see how all this looks like:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot = plot_clusters(vertex_clusters)\n", | |
| "m_x = final_polygon.map { |m| m[0] }\n", | |
| "m_y = final_polygon.map { |m| m[1] }\n", | |
| "sc = plot.add(:scatter, m_x, m_y)\n", | |
| "color = \"#ffff00\"\n", | |
| "sc.color(color)\n", | |
| "sc.shape('diamond')\n", | |
| "# add the MEAN POLYGON\n", | |
| "final_polygon.each_with_index do |c, i|\n", | |
| " next if i >= final_polygon.size-1\n", | |
| " from = [ final_polygon[i][0], final_polygon[i+1][0] ]\n", | |
| " to = [ final_polygon[i][1], final_polygon[i+1][1] ]\n", | |
| " plot.add(:line, from, to)\n", | |
| "end\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-c99629fa-eb97-40d8-857d-625b06b9dca7'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#654ba4\"},\"data\":\"4b394936-4ba7-4804-bb6f-2b331d237365\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#327493\"},\"data\":\"037d18cc-b414-4ca6-93d6-42667ffd7635\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#6bb404\"},\"data\":\"0f6efa17-762c-4d4b-9fe8-534bfacc13cf\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#b8cd89\"},\"data\":\"6b93cbc3-32b4-4748-afed-8ab856a7a9c7\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#8235a6\"},\"data\":\"f956eb40-5652-4b2e-91cc-fb94a951ae17\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#d9538d\"},\"data\":\"5f64c4c5-803f-4272-b578-04fbea9e9cc1\"},{\"type\":\"scatter\",\"options\":{\"x\":\"data0\",\"y\":\"data1\",\"color\":\"#ffff00\",\"shape\":\"diamond\"},\"data\":\"75b75ec5-1e92-48e1-b65a-d165b4adfd62\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"5ea59c9e-8d54-406b-b36d-a0b96e70c751\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"a62a3ea9-b4fa-461b-8ea5-91f9caeeb108\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"2f582a38-da7b-4c26-81d2-6de4f8bb3a48\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"7fa70935-e0e7-4b48-bf03-49f37cc243f2\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"02f1a255-dad3-4ff5-a5be-832af5ffc7d9\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"5f25db66-f393-49e6-8e06-6464178ecf05\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"xrange\":[-73.98633571101189,-73.98616953062057],\"yrange\":[40.73553787497709,40.73571995781772]}}],\"data\":{\"4b394936-4ba7-4804-bb6f-2b331d237365\":[{\"x\":-73.98627072572708,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.9862660318613,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554889117169,\"cluster\":0},{\"x\":-73.98627139627934,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.73554889117167,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.735550923560815,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554990736624,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.735550415463514,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554939926897,\"cluster\":0}],\"037d18cc-b414-4ca6-93d6-42667ffd7635\":[{\"x\":-73.98632504045963,\"y\":40.73557226364293,\"cluster\":1},{\"x\":-73.98632504045963,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632436990738,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735572771740024,\"cluster\":1},{\"x\":-73.98632571101189,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632235825062,\"y\":40.73557124744871,\"cluster\":1},{\"x\":-73.98632302880287,\"y\":40.73557023125444,\"cluster\":1}],\"0f6efa17-762c-4d4b-9fe8-534bfacc13cf\":[{\"x\":-73.98622445762157,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622512817383,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622445762157,\"y\":40.73570894162559,\"cluster\":2},{\"x\":-73.98622378706932,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2}],\"6b93cbc3-32b4-4748-afed-8ab856a7a9c7\":[{\"x\":-73.9861835539341,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569217445325,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569014206842,\"cluster\":3},{\"x\":-73.98618087172508,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569065016463,\"cluster\":3},{\"x\":-73.98618288338184,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.9861848950386,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618768252459,\"y\":40.73568957578454,\"cluster\":3},{\"x\":-73.98617953062057,\"y\":40.735689633972214,\"cluster\":3}],\"f956eb40-5652-4b2e-91cc-fb94a951ae17\":[{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563933242788,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.7356423810074,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73564034862108,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563984052446,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735638316234656,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4}],\"5f64c4c5-803f-4272-b578-04fbea9e9cc1\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620903491974,\"y\":40.73563577575159,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.735634251461676,\"cluster\":5},{\"x\":-73.9862110465765,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563475955834,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.73563272717173,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563577575159,\"cluster\":5}],\"75b75ec5-1e92-48e1-b65a-d165b4adfd62\":[{\"data0\":-73.98632407188416,\"data1\":40.73557147326963},{\"data0\":-73.98622447129412,\"data1\":40.73570855047738},{\"data0\":-73.98618267156186,\"data1\":40.73569075660959},{\"data0\":-73.98621819913387,\"data1\":40.73564040507624},{\"data0\":-73.98620896786451,\"data1\":40.735635064416286},{\"data0\":-73.9862696826458,\"data1\":40.73554911699269},{\"data0\":-73.98632407188416,\"data1\":40.73557147326963}],\"5ea59c9e-8d54-406b-b36d-a0b96e70c751\":[{\"data0\":-73.98632407188416,\"data1\":40.73557147326963},{\"data0\":-73.98622447129412,\"data1\":40.73570855047738}],\"a62a3ea9-b4fa-461b-8ea5-91f9caeeb108\":[{\"data0\":-73.98622447129412,\"data1\":40.73570855047738},{\"data0\":-73.98618267156186,\"data1\":40.73569075660959}],\"2f582a38-da7b-4c26-81d2-6de4f8bb3a48\":[{\"data0\":-73.98618267156186,\"data1\":40.73569075660959},{\"data0\":-73.98621819913387,\"data1\":40.73564040507624}],\"7fa70935-e0e7-4b48-bf03-49f37cc243f2\":[{\"data0\":-73.98621819913387,\"data1\":40.73564040507624},{\"data0\":-73.98620896786451,\"data1\":40.735635064416286}],\"02f1a255-dad3-4ff5-a5be-832af5ffc7d9\":[{\"data0\":-73.98620896786451,\"data1\":40.735635064416286},{\"data0\":-73.9862696826458,\"data1\":40.73554911699269}],\"5f25db66-f393-49e6-8e06-6464178ecf05\":[{\"data0\":-73.9862696826458,\"data1\":40.73554911699269},{\"data0\":-73.98632407188416,\"data1\":40.73557147326963}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-c99629fa-eb97-40d8-857d-625b06b9dca7');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 45, | |
| "text": [ | |
| "\"<div id='vis-c99629fa-eb97-40d8-857d-625b06b9dca7'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#654ba4\\\"},\\\"data\\\":\\\"4b394936-4ba7-4804-bb6f-2b331d237365\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#327493\\\"},\\\"data\\\":\\\"037d18cc-b414-4ca6-93d6-42667ffd7635\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#6bb404\\\"},\\\"data\\\":\\\"0f6efa17-762c-4d4b-9fe8-534bfacc13cf\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#b8cd89\\\"},\\\"data\\\":\\\"6b93cbc3-32b4-4748-afed-8ab856a7a9c7\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#8235a6\\\"},\\\"data\\\":\\\"f956eb40-5652-4b2e-91cc-fb94a951ae17\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#d9538d\\\"},\\\"data\\\":\\\"5f64c4c5-803f-4272-b578-04fbea9e9cc1\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\",\\\"color\\\":\\\"#ffff00\\\",\\\"shape\\\":\\\"diamond\\\"},\\\"data\\\":\\\"75b75ec5-1e92-48e1-b65a-d165b4adfd62\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"5ea59c9e-8d54-406b-b36d-a0b96e70c751\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"a62a3ea9-b4fa-461b-8ea5-91f9caeeb108\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"2f582a38-da7b-4c26-81d2-6de4f8bb3a48\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"7fa70935-e0e7-4b48-bf03-49f37cc243f2\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"02f1a255-dad3-4ff5-a5be-832af5ffc7d9\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"5f25db66-f393-49e6-8e06-6464178ecf05\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98633571101189,-73.98616953062057],\\\"yrange\\\":[40.73553787497709,40.73571995781772]}}],\\\"data\\\":{\\\"4b394936-4ba7-4804-bb6f-2b331d237365\\\":[{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169,\\\"cluster\\\":0},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897,\\\"cluster\\\":0}],\\\"037d18cc-b414-4ca6-93d6-42667ffd7635\\\":[{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293,\\\"cluster\\\":1},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024,\\\"cluster\\\":1},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871,\\\"cluster\\\":1},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444,\\\"cluster\\\":1}],\\\"0f6efa17-762c-4d4b-9fe8-534bfacc13cf\\\":[{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559,\\\"cluster\\\":2},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2}],\\\"6b93cbc3-32b4-4748-afed-8ab856a7a9c7\\\":[{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842,\\\"cluster\\\":3},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463,\\\"cluster\\\":3},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454,\\\"cluster\\\":3},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3}],\\\"f956eb40-5652-4b2e-91cc-fb94a951ae17\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4}],\\\"5f64c4c5-803f-4272-b578-04fbea9e9cc1\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676,\\\"cluster\\\":5},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5}],\\\"75b75ec5-1e92-48e1-b65a-d165b4adfd62\\\":[{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963},{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738},{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959},{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624},{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286},{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269},{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963}],\\\"5ea59c9e-8d54-406b-b36d-a0b96e70c751\\\":[{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963},{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738}],\\\"a62a3ea9-b4fa-461b-8ea5-91f9caeeb108\\\":[{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738},{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959}],\\\"2f582a38-da7b-4c26-81d2-6de4f8bb3a48\\\":[{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959},{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624}],\\\"7fa70935-e0e7-4b48-bf03-49f37cc243f2\\\":[{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624},{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286}],\\\"02f1a255-dad3-4ff5-a5be-832af5ffc7d9\\\":[{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286},{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269}],\\\"5f25db66-f393-49e6-8e06-6464178ecf05\\\":[{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269},{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-c99629fa-eb97-40d8-857d-625b06b9dca7');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 45 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "To wrap it all up we create a single consensus function that receives a GeoJSON string and returns a list of mean polygons:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def calculate_polygonfix_consensus(geojson)\n", | |
| " output = []\n", | |
| " geom = parse(geojson)\n", | |
| " centroids = get_all_centroids(geom)\n", | |
| " centroid_clusters = cluster_centroids(centroids)\n", | |
| " centroid_clusters.each do |ccluster|\n", | |
| " cluster = ccluster[1] # only the set of latlons\n", | |
| " sub_geom = get_polys_for_centroid_cluster(cluster, centroids, geom)\n", | |
| " next if sub_geom.size == 0\n", | |
| " original_points = get_all_poly_points(sub_geom)\n", | |
| " next if original_points == nil\n", | |
| " clusters = cluster_points(original_points)\n", | |
| " next if !validate_clusters(clusters, original_points)\n", | |
| " mean_poly = get_mean_poly(clusters)\n", | |
| " next if mean_poly == {}\n", | |
| " connections = connect_clusters(clusters, original_points)\n", | |
| " next if connections == {}\n", | |
| " poly = connect_mean_poly(mean_poly, connections)\n", | |
| " next if poly == nil || poly.count == 0\n", | |
| " output.push(poly)\n", | |
| " end\n", | |
| " return output\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 47, | |
| "text": [ | |
| ":calculate_polygonfix_consensus" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 47 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "consensus = calculate_polygonfix_consensus(geomstr)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 48, | |
| "text": [ | |
| "[[[-73.98632407188416, 40.73557147326963], [-73.98622447129412, 40.73570855047738], [-73.98618267156186, 40.73569075660959], [-73.98621819913387, 40.73564040507624], [-73.98620896786451, 40.735635064416286], [-73.9862696826458, 40.73554911699269], [-73.98632407188416, 40.73557147326963]]]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 48 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The GeoJSON of all this might look something like:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "geo_json = {:type => \"FeatureCollection\", :features => consensus.map { |f| {:type => \"Feature\", :properties => { :id => 1 }, :geometry => { :type => \"Polygon\", :coordinates =>[f] } } } }.to_json\n", | |
| "puts geo_json" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": [ | |
| "{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"id\":1},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632407188416,40.73557147326963],[-73.98622447129412,40.73570855047738],[-73.98618267156186,40.73569075660959],[-73.98621819913387,40.73564040507624],[-73.98620896786451,40.735635064416286],[-73.9862696826458,40.73554911699269],[-73.98632407188416,40.73557147326963]]]}}]}\n" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 64 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now let's plots the resulting GeoJSON on the original map (purple):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "IRuby.html '<iframe src=\"http://jsfiddle.net/mgiraldo/m4XeU/1/embedded/result/\" width=500 height=400></iframe>'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<iframe src=\"http://jsfiddle.net/mgiraldo/m4XeU/1/embedded/result/\" width=500 height=400></iframe>" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 65, | |
| "text": [ | |
| "\"<iframe src=\\\"http://jsfiddle.net/mgiraldo/m4XeU/1/embedded/result/\\\" width=500 height=400></iframe>\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 65 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Voil\u00e0! The mean polygon looks good!" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Conclusion\n", | |
| "\n", | |
| "This is a first step towards finding geometric consensus from a list of user contributions to a given starting geometry and a map. It is a work in progress and hopefully other ideas can be added to improve this algorithm.\n", | |
| "\n", | |
| "This code is part of NYPL Labs' [Building Inspector](http://buildinginspector.nypl.org/). Explore and fork the [GitHub repository](https://github.com/NYPL/building-inspector).\n", | |
| "\n", | |
| "This notebook was created by [Mauricio Giraldo Arteaga](https://twitter.com/mgiraldo)." | |
| ] | |
| } | |
| ], | |
| "metadata": {} | |
| } | |
| ] | |
| } |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
http://nbviewer.ipython.org/urls/gist.githubusercontent.com/domitry/e087d69315075bebe3b1/raw/5110b04d5591c91b2bc269ed41d647bdec682f00/polygonfix%20writeup.ipynb