我正在尝试使用scrapy elasticsearch管道(此处:https://github.com/knockrentals/scrapy-elasticsearch)将数据放入elasticsearch。但是我得到以下错误,我知道它与当前设置为'url'的ELASTICSEARCH_UNIQ_KEY值有关,但我不知道它应该设置为什么。
此处的类似帖子推荐涉及为唯一键创建字段的解决方案,但我不明白这意味着什么。
这是我的错误消息:
2015-08-05 11:34:40 [scrapy] ERROR: Error processing {'link': [u'http://www.meetup.com/Search-Meetup-Karlsruhe/events/192357732/'],
'title': [u'Suchen in der vernetzten Welt']}
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 70, in process_item
self.index_item(item)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 52, in index_item
local_id = hashlib.sha1(item[uniq_key]).hexdigest()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/item.py", line 56, in __getitem__
return self._values[key]
KeyError: 'url'
答案 0 :(得分:0)
此处的类似帖子推荐涉及创建的解决方案 唯一键的字段,但我不明白这意味着什么。
使用您在Item
中配置的名称在ELASTICSEARCH_UNIQ_KEY
中声明一个字段。
import scrapy
class DemoItem(scrapy.Item):
url = scrapy.Field() # ELASTICSEARCH_UNIQ_KEY
class DemoSpider(scrapy.Spider):
name = 'demo'
start_urls = ['http://www.example.com']
def parse(self, response):
demoItem = DemoItem()
demoItem['url'] = response.url
yield demoItem