-
-
Save jayswan/a8d9920ef74516a02fe1 to your computer and use it in GitHub Desktop.
| >>> import itertools | |
| >>> import string | |
| >>> from elasticsearch import Elasticsearch,helpers | |
| es = Elasticsearch() | |
| >>> # k is a generator expression that produces | |
| ... # a series of dictionaries containing test data. | |
| ... # The test data are just letter permutations | |
| ... # created with itertools.permutations. | |
| ... # | |
| ... # We then reference k as the iterator that's | |
| ... # consumed by the elasticsearch.helpers.bulk method. | |
| >>> k = ({'_type':'foo', '_index':'test','letters':''.join(letters)} | |
| ... for letters in itertools.permutations(string.letters,2)) | |
| >>> # calling k.next() shows examples | |
| ... # (while consuming the generator, of course) | |
| >>> # each dict contains a doc type, index, and data (at minimum) | |
| >>> k.next() | |
| {'_type': 'foo', 'letters': 'ab', '_index': 'test'} | |
| >>> k.next() | |
| {'_type': 'foo', 'letters': 'ac', '_index': 'test'} | |
| >>> # create our test index | |
| >>> es.indices.create('test') | |
| {u'acknowledged': True} | |
| >>> helpers.bulk(es,k) | |
| (2650, []) | |
| >>> # check to make sure we got what we expected... | |
| >>> es.count(index='test') | |
| {u'count': 2650, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}} |
Would this work for a python list of JSON documents?
@zyxwu Butt it's hell slow.
I have a use case where I am updating/adding documents using the bulk API call and just after I fire the bulk call I check the ES count using the count API. Problem is there is some delay that happens after firing the bulk call which results in a delay in reflecting the correct count. How do I make sure the correct count reflects after I fire the bulk call?
@tusharkale197 ElasticSearch is near real time database not exact real time database and hence once you index a document changes will be reflected within index.refresh_interval time period default value of which is 1s you can change it accordingly and also if you want you can manually refresh index using refresh api
but when I index documents one by one, everything's fine
res = es.index(index=INDEX, doc_type=DOC_TYPE, id=ind, body=JS_message)