将Scrapy数据导出到Elasticsearch时出现唯一键错误

时间:2015-08-05 15:34:31

标签: elasticsearch scrapy

我正在尝试使用scrapy elasticsearch管道(此处:https://github.com/knockrentals/scrapy-elasticsearch)将数据放入elasticsearch。但是我得到以下错误,我知道它与当前设置为'url'的ELASTICSEARCH_UNIQ_KEY值有关,但我不知道它应该设置为什么。

此处的类似帖子推荐涉及为唯一键创建字段的解决方案,但我不明白这意味着什么。

这是我的错误消息:

2015-08-05 11:34:40 [scrapy] ERROR: Error processing {'link': [u'http://www.meetup.com/Search-Meetup-Karlsruhe/events/192357732/'],
 'title': [u'Suchen in der vernetzten Welt']}
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 70, in process_item
    self.index_item(item)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 52, in index_item
    local_id = hashlib.sha1(item[uniq_key]).hexdigest()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/item.py", line 56, in __getitem__
    return self._values[key]
KeyError: 'url'

1 个答案:

答案 0 :(得分:0)

  

此处的类似帖子推荐涉及创建的解决方案   唯一键的字段,但我不明白这意味着什么。

使用您在Item中配置的名称在ELASTICSEARCH_UNIQ_KEY中声明一个字段。

import scrapy

class DemoItem(scrapy.Item):
    url = scrapy.Field()  # ELASTICSEARCH_UNIQ_KEY

class DemoSpider(scrapy.Spider):

    name = 'demo'
    start_urls = ['http://www.example.com']

    def parse(self, response):

        demoItem = DemoItem()
        demoItem['url'] = response.url
        yield demoItem