Last active
February 12, 2016 16:25
-
-
Save nickynicolson/3ae2e2074c1912996fe8 to your computer and use it in GitHub Desktop.
Name reconciliation in Python
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Reconciling plant taxonomic names from Python\n", | |
| "\n", | |
| "\n", | |
| "## Introduction - \"strings to things\"\n", | |
| "\n", | |
| "RBG Kew host a set of [services](http://data1.kew.org/reconciliation/) conforming to the Open Refine reconciliation service API, allowing Open Refine users to reconcile plant names in an Open Refine project against RBG Kew resources - to turn the \"string\" representation of a plant name into an actionable \"thing\" with a resolvable identifier. \n", | |
| "\n", | |
| "The API is simple JSON over HTTP so can be accessed from any programming language.\n", | |
| "\n", | |
| "This notebook explains how to interact with the API from Python. Its organised as follows:\n", | |
| "1. Explain the components of the service and resolve each of the resources by hand\n", | |
| "1. Repeat this in Python\n", | |
| "1. Finally loop over a dataset in a `pandas` dataframe and reconcile each of the names, storing the identifier in a new column" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## The components of the service - resolving by hand\n", | |
| "We're interested in reconciling our names against IPNI (the [International Plant Names Index](http://www.ipni.org)). \n", | |
| "\n", | |
| "The details for this particular service are here: http://data1.kew.org/reconciliation/about/IpniName - this page tells us that the endpoint for the reconciliation service is http://data1.kew.org/reconciliation/reconcile/IpniName - this will be the starting point for our code.\n", | |
| "\n", | |
| "\n", | |
| "### Service metadata\n", | |
| "Simply resolving the endpoint returns some JSON metadata about the service. This contains all the details we need to build our calls to the service.\n", | |
| "\n", | |
| "```json\n", | |
| "{\n", | |
| " \"name\": \"IPNI Name Reconciliation Service\",\n", | |
| " \"identifierSpace\": \"http://ipni.org/urn:lsid:ipni.org:names:\",\n", | |
| " \"schemaSpace\": \"http://rdf.freebase.com/ns/type.object.id\",\n", | |
| " \"view\": {\n", | |
| " \"url\": \"http://ipni.org/urn:lsid:ipni.org:names:{{id}}\"\n", | |
| " },\n", | |
| " \"preview\": {\n", | |
| " \"url\": \"http://ipni.org/urn:lsid:ipni.org:names:{{id}}\",\n", | |
| " \"width\": 400,\n", | |
| " \"height\": 400\n", | |
| " },\n", | |
| " \"suggest\": {\n", | |
| " \"type\": {\n", | |
| " \"service_url\": \"http://data1.kew.org\",\n", | |
| " \"service_path\": \"/reconciliation/reconcile/IpniName/suggestType\",\n", | |
| " \"flyout_service_url\": \"http://data1.kew.org\",\n", | |
| " \"flyout_service_path\": \"/reconciliation/reconcile/IpniName/flyoutType/${id}\"\n", | |
| " },\n", | |
| " \"property\": {\n", | |
| " \"service_url\": \"http://data1.kew.org\",\n", | |
| " \"service_path\": \"/reconciliation/reconcile/IpniName/suggestProperty\",\n", | |
| " \"flyout_service_url\": \"http://data1.kew.org\",\n", | |
| " \"flyout_service_path\": \"/reconciliation/reconcile/IpniName/flyoutProperty/${id}\"\n", | |
| " },\n", | |
| " \"entity\": {\n", | |
| " \"service_url\": \"http://data1.kew.org\",\n", | |
| " \"service_path\": \"/reconciliation/reconcile/IpniName\",\n", | |
| " \"flyout_service_url\": \"http://data1.kew.org\",\n", | |
| " \"flyout_service_path\": \"/reconciliation/reconcile/IpniName/flyout/${id}\"\n", | |
| " }\n", | |
| " },\n", | |
| " \"defaultTypes\": [\n", | |
| " {\n", | |
| " \"id\": \"/biology/organism_classification/scientific_name\",\n", | |
| " \"name\": \"Scientific name\"\n", | |
| " }\n", | |
| " ]\n", | |
| "}\n", | |
| "```\n", | |
| "\n", | |
| "With this information, we can build a sample call to the service. A call is encoded in JSON:\n", | |
| "```json\n", | |
| "{\n", | |
| " \"query\": \"Melocactus braunii Esteves\"\n", | |
| " ,\"limit\": 3\n", | |
| " ,\"type\": \"/biology/organism_classification/scientific_name\"\n", | |
| " ,\"type_strict\": \"any\"\n", | |
| " ,\"properties\": []\n", | |
| "}\n", | |
| "```\n", | |
| "\n", | |
| "The JSON is urlencoded (online urlencoder is available [here](http://meyerweb.com/eric/tools/dencoder/)) and passed to the service endpoint via the `query` URL parameter:\n", | |
| "\n", | |
| "[Click to run the query](http://data1.kew.org/reconciliation/reconcile/IpniName?query=%7B%0A%20%20%20%20%22query%22%3A%20%20%22Melocactus%20braunii%20Esteves%22%0A%20%20%20%20%2C%22limit%22%3A%20%203%0A%20%20%20%20%2C%22type%22%3A%20%22%2Fbiology%2Forganism_classification%2Fscientific_name%22%0A%20%20%20%20%2C%22type_strict%22%3A%20%22any%22%0A%20%20%20%20%2C%22properties%22%3A%20%5B%5D%0A%7D)\n", | |
| "\n", | |
| "The service returns this response:\n", | |
| "```json\n", | |
| "{\n", | |
| " \"result\": [\n", | |
| " {\n", | |
| " \"id\": \"60432529-2\",\n", | |
| " \"name\": \"Cactaceae Melocactus braunii Esteves\",\n", | |
| " \"type\": [\n", | |
| " {\n", | |
| " \"id\": \"/biology/organism_classification/scientific_name\",\n", | |
| " \"name\": \"Scientific name\"\n", | |
| " }\n", | |
| " ],\n", | |
| " \"score\": 100,\n", | |
| " \"match\": true\n", | |
| " }\n", | |
| " ]\n", | |
| "}```\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Running the reconciliation in Python\n", | |
| "Now we'll run through the steps above, but in Python. We'll use the JSON library to help read / write JSON." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "{\n", | |
| " \"defaultTypes\": [\n", | |
| " {\n", | |
| " \"id\": \"/biology/organism_classification/scientific_name\",\n", | |
| " \"name\": \"Scientific name\"\n", | |
| " }\n", | |
| " ],\n", | |
| " \"identifierSpace\": \"http://ipni.org/urn:lsid:ipni.org:names:\",\n", | |
| " \"name\": \"IPNI Name Reconciliation Service\",\n", | |
| " \"preview\": {\n", | |
| " \"height\": 400,\n", | |
| " \"url\": \"http://ipni.org/urn:lsid:ipni.org:names:{{id}}\",\n", | |
| " \"width\": 400\n", | |
| " },\n", | |
| " \"schemaSpace\": \"http://rdf.freebase.com/ns/type.object.id\",\n", | |
| " \"suggest\": {\n", | |
| " \"entity\": {\n", | |
| " \"flyout_service_path\": \"/reconciliation/reconcile/IpniName/flyout/${id}\",\n", | |
| " \"flyout_service_url\": \"http://data1.kew.org\",\n", | |
| " \"service_path\": \"/reconciliation/reconcile/IpniName\",\n", | |
| " \"service_url\": \"http://data1.kew.org\"\n", | |
| " },\n", | |
| " \"property\": {\n", | |
| " \"flyout_service_path\": \"/reconciliation/reconcile/IpniName/flyoutProperty/${id}\",\n", | |
| " \"flyout_service_url\": \"http://data1.kew.org\",\n", | |
| " \"service_path\": \"/reconciliation/reconcile/IpniName/suggestProperty\",\n", | |
| " \"service_url\": \"http://data1.kew.org\"\n", | |
| " },\n", | |
| " \"type\": {\n", | |
| " \"flyout_service_path\": \"/reconciliation/reconcile/IpniName/flyoutType/${id}\",\n", | |
| " \"flyout_service_url\": \"http://data1.kew.org\",\n", | |
| " \"service_path\": \"/reconciliation/reconcile/IpniName/suggestType\",\n", | |
| " \"service_url\": \"http://data1.kew.org\"\n", | |
| " }\n", | |
| " },\n", | |
| " \"view\": {\n", | |
| " \"url\": \"http://ipni.org/urn:lsid:ipni.org:names:{{id}}\"\n", | |
| " }\n", | |
| "}\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "import json\n", | |
| "import urllib.request\n", | |
| "serviceEndpoint=\"http://data1.kew.org/reconciliation/reconcile/IpniName\"\n", | |
| "\n", | |
| "# Resolve endpoint to get JSON metadata about the service:\n", | |
| "httpstream = urllib.request.urlopen(serviceEndpoint)\n", | |
| "serviceMetadata = json.loads(httpstream.read().decode('utf-8'))\n", | |
| "print(json.dumps(serviceMetadata, sort_keys=True, indent=4))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "{\n", | |
| " \"limit\": 3,\n", | |
| " \"properties\": [],\n", | |
| " \"query\": \"Melocactus braunii Esteves\",\n", | |
| " \"type\": \"/biology/organism_classification/scientific_name\",\n", | |
| " \"type_strict\": \"any\"\n", | |
| "}\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Build a query object\n", | |
| "\n", | |
| "# First define a template that can be re-used with different query values:\n", | |
| "query_limit=3\n", | |
| "queryTemplate={'limit':query_limit, 'type':serviceMetadata['defaultTypes'][0]['id'],'type_strict':'any','properties':[]}\n", | |
| "\n", | |
| "query=queryTemplate.copy()\n", | |
| "plantname=\"Melocactus braunii Esteves\"\n", | |
| "query['query']=plantname\n", | |
| "print(json.dumps(query, sort_keys=True, indent=4))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "{\n", | |
| " \"result\": [\n", | |
| " {\n", | |
| " \"id\": \"60432529-2\",\n", | |
| " \"match\": true,\n", | |
| " \"name\": \"Cactaceae Melocactus braunii Esteves\",\n", | |
| " \"score\": 100.0,\n", | |
| " \"type\": [\n", | |
| " {\n", | |
| " \"id\": \"/biology/organism_classification/scientific_name\",\n", | |
| " \"name\": \"Scientific name\"\n", | |
| " }\n", | |
| " ]\n", | |
| " }\n", | |
| " ]\n", | |
| "}\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Encode the query object and pass to the service, print the results\n", | |
| "encodedData=urllib.parse.urlencode({'query':json.dumps(query)}) \n", | |
| "httpstream = urllib.request.urlopen(serviceEndpoint+'?'+encodedData)\n", | |
| "serviceResponse = json.loads(httpstream.read().decode('utf-8'))\n", | |
| "print(json.dumps(serviceResponse, sort_keys=True, indent=4))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Reconciling a complete dataset\n", | |
| "We'll grab some names from GBIF and save in a `pandas` dataframe" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>gbifKey</th>\n", | |
| " <th>scientificName</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>1229925413</td>\n", | |
| " <td>Carnegiea gigantea (Engelm.) Britton & Rose</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>1229925495</td>\n", | |
| " <td>Carnegiea gigantea (Engelm.) Britton & Rose</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>1229928509</td>\n", | |
| " <td>Ferocactus echidne (DC.) Britton & Rose</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>1233595756</td>\n", | |
| " <td>Opuntia karwinskiana Salm-Dyck</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>1229930376</td>\n", | |
| " <td>Isolatocereus dumortieri (Schiedw.) Backeb.</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>5</th>\n", | |
| " <td>1229612330</td>\n", | |
| " <td>Selenicereus spinulosus (DC.) Britt. & Rose</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>6</th>\n", | |
| " <td>1233595444</td>\n", | |
| " <td>Cylindropuntia imbricata (Haw.) F.M. Knuth</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>7</th>\n", | |
| " <td>1229614235</td>\n", | |
| " <td>Opuntia basilaris Engelm. & Bigelow</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>8</th>\n", | |
| " <td>1227770854</td>\n", | |
| " <td>Mammillaria tetrancistra Engelm.</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>9</th>\n", | |
| " <td>1229928525</td>\n", | |
| " <td>Cylindropuntia imbricata (Haw.) F.M. Knuth</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>10</th>\n", | |
| " <td>1229930352</td>\n", | |
| " <td>Isolatocereus dumortieri (Schiedw.) Backeb.</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>11</th>\n", | |
| " <td>1229612739</td>\n", | |
| " <td>Opuntia basilaris var. basilaris</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>12</th>\n", | |
| " <td>1233600722</td>\n", | |
| " <td>Echinocereus engelmannii (Parry ex Engelm.) Lem.</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>13</th>\n", | |
| " <td>1229615289</td>\n", | |
| " <td>Opuntia basilaris Engelm. & Bigelow</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>14</th>\n", | |
| " <td>1229924356</td>\n", | |
| " <td>Cylindropuntia bigelovii (Engelm.) F.M.Knuth</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>15</th>\n", | |
| " <td>1229924383</td>\n", | |
| " <td>Opuntia basilaris Engelm. & Bigelow</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>16</th>\n", | |
| " <td>1229609723</td>\n", | |
| " <td>Cylindropuntia ramosissima (Engelm.) F.M.Knuth</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>17</th>\n", | |
| " <td>1229613556</td>\n", | |
| " <td>Opuntia basilaris Engelm. & Bigelow</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>18</th>\n", | |
| " <td>1229929113</td>\n", | |
| " <td>Mammillaria grahamii Engelm.</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>19</th>\n", | |
| " <td>1233596831</td>\n", | |
| " <td>Opuntia littoralis (Engelm.) Cockerell</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " gbifKey scientificName\n", | |
| "0 1229925413 Carnegiea gigantea (Engelm.) Britton & Rose\n", | |
| "1 1229925495 Carnegiea gigantea (Engelm.) Britton & Rose\n", | |
| "2 1229928509 Ferocactus echidne (DC.) Britton & Rose\n", | |
| "3 1233595756 Opuntia karwinskiana Salm-Dyck\n", | |
| "4 1229930376 Isolatocereus dumortieri (Schiedw.) Backeb.\n", | |
| "5 1229612330 Selenicereus spinulosus (DC.) Britt. & Rose\n", | |
| "6 1233595444 Cylindropuntia imbricata (Haw.) F.M. Knuth\n", | |
| "7 1229614235 Opuntia basilaris Engelm. & Bigelow\n", | |
| "8 1227770854 Mammillaria tetrancistra Engelm.\n", | |
| "9 1229928525 Cylindropuntia imbricata (Haw.) F.M. Knuth\n", | |
| "10 1229930352 Isolatocereus dumortieri (Schiedw.) Backeb.\n", | |
| "11 1229612739 Opuntia basilaris var. basilaris\n", | |
| "12 1233600722 Echinocereus engelmannii (Parry ex Engelm.) Lem.\n", | |
| "13 1229615289 Opuntia basilaris Engelm. & Bigelow\n", | |
| "14 1229924356 Cylindropuntia bigelovii (Engelm.) F.M.Knuth\n", | |
| "15 1229924383 Opuntia basilaris Engelm. & Bigelow\n", | |
| "16 1229609723 Cylindropuntia ramosissima (Engelm.) F.M.Knuth\n", | |
| "17 1229613556 Opuntia basilaris Engelm. & Bigelow\n", | |
| "18 1229929113 Mammillaria grahamii Engelm.\n", | |
| "19 1233596831 Opuntia littoralis (Engelm.) Cockerell" | |
| ] | |
| }, | |
| "execution_count": 4, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "import pandas as pd\n", | |
| "gbifOccurenceSearchUrl=\"http://api.gbif.org/v1/occurrence/search?scientificName=Cactaceae\"\n", | |
| "httpstream = urllib.request.urlopen(gbifOccurenceSearchUrl)\n", | |
| "gbifResults = json.loads(httpstream.read().decode('utf-8'))\n", | |
| "gbifDict={'gbifKey':[],'scientificName':[]}\n", | |
| "for result in gbifResults['results']:\n", | |
| " gbifDict['gbifKey'].append(result['key'])\n", | |
| " gbifDict['scientificName'].append(result['scientificName'])\n", | |
| "df=pd.DataFrame(gbifDict)\n", | |
| "df" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we'll loop over these names and fire each against the service, storing the ID in a new column." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>gbifKey</th>\n", | |
| " <th>scientificName</th>\n", | |
| " <th>ipniId</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>1229925413</td>\n", | |
| " <td>Carnegiea gigantea (Engelm.) Britton & Rose</td>\n", | |
| " <td>[47644-2]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>1229925495</td>\n", | |
| " <td>Carnegiea gigantea (Engelm.) Britton & Rose</td>\n", | |
| " <td>[47644-2]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>1229928509</td>\n", | |
| " <td>Ferocactus echidne (DC.) Britton & Rose</td>\n", | |
| " <td>[133138-1]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>1233595756</td>\n", | |
| " <td>Opuntia karwinskiana Salm-Dyck</td>\n", | |
| " <td>[136712-1]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>1229930376</td>\n", | |
| " <td>Isolatocereus dumortieri (Schiedw.) Backeb.</td>\n", | |
| " <td>[130623-2]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>5</th>\n", | |
| " <td>1229612330</td>\n", | |
| " <td>Selenicereus spinulosus (DC.) Britt. & Rose</td>\n", | |
| " <td>[232396-2]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>6</th>\n", | |
| " <td>1233595444</td>\n", | |
| " <td>Cylindropuntia imbricata (Haw.) F.M. Knuth</td>\n", | |
| " <td>[73870-2]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>7</th>\n", | |
| " <td>1229614235</td>\n", | |
| " <td>Opuntia basilaris Engelm. & Bigelow</td>\n", | |
| " <td>[281611-2, 136376-1]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>8</th>\n", | |
| " <td>1227770854</td>\n", | |
| " <td>Mammillaria tetrancistra Engelm.</td>\n", | |
| " <td>[151799-2]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>9</th>\n", | |
| " <td>1229928525</td>\n", | |
| " <td>Cylindropuntia imbricata (Haw.) F.M. Knuth</td>\n", | |
| " <td>[73870-2]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>10</th>\n", | |
| " <td>1229930352</td>\n", | |
| " <td>Isolatocereus dumortieri (Schiedw.) Backeb.</td>\n", | |
| " <td>[130623-2]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>11</th>\n", | |
| " <td>1229612739</td>\n", | |
| " <td>Opuntia basilaris var. basilaris</td>\n", | |
| " <td>[]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>12</th>\n", | |
| " <td>1233600722</td>\n", | |
| " <td>Echinocereus engelmannii (Parry ex Engelm.) Lem.</td>\n", | |
| " <td>[132348-1]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>13</th>\n", | |
| " <td>1229615289</td>\n", | |
| " <td>Opuntia basilaris Engelm. & Bigelow</td>\n", | |
| " <td>[281611-2, 136376-1]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>14</th>\n", | |
| " <td>1229924356</td>\n", | |
| " <td>Cylindropuntia bigelovii (Engelm.) F.M.Knuth</td>\n", | |
| " <td>[73844-2]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>15</th>\n", | |
| " <td>1229924383</td>\n", | |
| " <td>Opuntia basilaris Engelm. & Bigelow</td>\n", | |
| " <td>[281611-2, 136376-1]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>16</th>\n", | |
| " <td>1229609723</td>\n", | |
| " <td>Cylindropuntia ramosissima (Engelm.) F.M.Knuth</td>\n", | |
| " <td>[73887-2, 131411-1]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>17</th>\n", | |
| " <td>1229613556</td>\n", | |
| " <td>Opuntia basilaris Engelm. & Bigelow</td>\n", | |
| " <td>[281611-2, 136376-1]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>18</th>\n", | |
| " <td>1229929113</td>\n", | |
| " <td>Mammillaria grahamii Engelm.</td>\n", | |
| " <td>[280351-2, 134635-1]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>19</th>\n", | |
| " <td>1233596831</td>\n", | |
| " <td>Opuntia littoralis (Engelm.) Cockerell</td>\n", | |
| " <td>[175247-2, 1039267-2]</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " gbifKey scientificName \\\n", | |
| "0 1229925413 Carnegiea gigantea (Engelm.) Britton & Rose \n", | |
| "1 1229925495 Carnegiea gigantea (Engelm.) Britton & Rose \n", | |
| "2 1229928509 Ferocactus echidne (DC.) Britton & Rose \n", | |
| "3 1233595756 Opuntia karwinskiana Salm-Dyck \n", | |
| "4 1229930376 Isolatocereus dumortieri (Schiedw.) Backeb. \n", | |
| "5 1229612330 Selenicereus spinulosus (DC.) Britt. & Rose \n", | |
| "6 1233595444 Cylindropuntia imbricata (Haw.) F.M. Knuth \n", | |
| "7 1229614235 Opuntia basilaris Engelm. & Bigelow \n", | |
| "8 1227770854 Mammillaria tetrancistra Engelm. \n", | |
| "9 1229928525 Cylindropuntia imbricata (Haw.) F.M. Knuth \n", | |
| "10 1229930352 Isolatocereus dumortieri (Schiedw.) Backeb. \n", | |
| "11 1229612739 Opuntia basilaris var. basilaris \n", | |
| "12 1233600722 Echinocereus engelmannii (Parry ex Engelm.) Lem. \n", | |
| "13 1229615289 Opuntia basilaris Engelm. & Bigelow \n", | |
| "14 1229924356 Cylindropuntia bigelovii (Engelm.) F.M.Knuth \n", | |
| "15 1229924383 Opuntia basilaris Engelm. & Bigelow \n", | |
| "16 1229609723 Cylindropuntia ramosissima (Engelm.) F.M.Knuth \n", | |
| "17 1229613556 Opuntia basilaris Engelm. & Bigelow \n", | |
| "18 1229929113 Mammillaria grahamii Engelm. \n", | |
| "19 1233596831 Opuntia littoralis (Engelm.) Cockerell \n", | |
| "\n", | |
| " ipniId \n", | |
| "0 [47644-2] \n", | |
| "1 [47644-2] \n", | |
| "2 [133138-1] \n", | |
| "3 [136712-1] \n", | |
| "4 [130623-2] \n", | |
| "5 [232396-2] \n", | |
| "6 [73870-2] \n", | |
| "7 [281611-2, 136376-1] \n", | |
| "8 [151799-2] \n", | |
| "9 [73870-2] \n", | |
| "10 [130623-2] \n", | |
| "11 [] \n", | |
| "12 [132348-1] \n", | |
| "13 [281611-2, 136376-1] \n", | |
| "14 [73844-2] \n", | |
| "15 [281611-2, 136376-1] \n", | |
| "16 [73887-2, 131411-1] \n", | |
| "17 [281611-2, 136376-1] \n", | |
| "18 [280351-2, 134635-1] \n", | |
| "19 [175247-2, 1039267-2] " | |
| ] | |
| }, | |
| "execution_count": 5, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# add a new column to store the IDs\n", | |
| "df['ipniId']=[None] * len(df)\n", | |
| "\n", | |
| "# loop and check name against service, populate new column with ID\n", | |
| "for (index,row) in df.iterrows():\n", | |
| " query=queryTemplate.copy()\n", | |
| " query['query']=df.at[index,'scientificName']\n", | |
| " encodedData=urllib.parse.urlencode({'query':json.dumps(query)}) \n", | |
| " httpstream = urllib.request.urlopen(serviceEndpoint+'?'+encodedData)\n", | |
| " serviceResponse = json.loads(httpstream.read().decode('utf-8'))\n", | |
| " ids=[]\n", | |
| " for result in serviceResponse['result']:\n", | |
| " ids.append(result['id'])\n", | |
| " df.at[index,'ipniId']=ids\n", | |
| "df" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we have a new column, populated with IPNI IDs for these plant names. \n", | |
| "\n", | |
| "Note that some names return multiple matches - this is due to duplication in the underlying dataset. We're working on resolving that (as of February 2016)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Where next\n", | |
| "\n", | |
| "The next edition of this notebook will use the IDs to retrieve some associated linked data." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Useful links\n", | |
| "\n", | |
| "- Kew home page for the services (including user documentation): http://data1.kew.org/reconciliation/\n", | |
| "- Presentations about these services:\n", | |
| " - [Strings to things - a user friendly framework for data reconciliation](http://www.slideshare.net/nickyn/829-tdwg2015nicolsonkewstringstothings)\n", | |
| "- GitHub codebase:\n", | |
| " - [Reconciliation & Matching Framework](https://github.com/RBGKew/Reconciliation-and-Matching-Framework)\n", | |
| " - [String Transformers](https://github.com/RBGKew/String-Transformers)\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Contact\n", | |
| "\n", | |
| "[Nicky Nicolson](http://bit.ly/kew-science-nicky) at Kew or [@nickynicolson](http://twitter.com/nickynicolson) on Twitter." | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.5.1" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment