我一直在关注本教程(http://blog.florian-hopf.de/2014/07/scrapy-and-elasticsearch.html)并使用此scrapy elasticsearch管道(https://github.com/knockrentals/scrapy-elasticsearch),并且能够将scrapy中的数据提取到JSON文件并启用弹性搜索服务器并在localhost上运行。
但是,当我尝试使用管道将删除的数据发送到elasticsearch时,我收到以下错误:
2015-08-05 21:21:53 [scrapy] ERROR: Error processing {'link': [u'http://www.meetup.com/Search-Meetup-Karlsruhe/events/221907250/'],
'title': [u'Alles rund um Elasticsearch']}
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 70, in process_item
self.index_item(item)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 52, in index_item
local_id = hashlib.sha1(item[uniq_key]).hexdigest()
TypeError: must be string or buffer, not list
my items.py scrapy文件如下所示:
from scrapy.item import Item, Field
class MeetupItem(Item):
title = Field()
link = Field()
description = Field()
和(我认为只有相关部分)我的settings.py文件如下所示:
from scrapy import log
ITEM_PIPELINES = [
'scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline',
]
ELASTICSEARCH_SERVER = 'localhost' # If not 'localhost' prepend 'http://'
ELASTICSEARCH_PORT = 9200 # If port 80 leave blank
ELASTICSEARCH_USERNAME = ''
ELASTICSEARCH_PASSWORD = ''
ELASTICSEARCH_INDEX = 'meetups'
ELASTICSEARCH_TYPE = 'meetup'
ELASTICSEARCH_UNIQ_KEY = 'link'
ELASTICSEARCH_LOG_LEVEL= log.DEBUG
任何帮助将不胜感激!
答案 0 :(得分:2)
正如您在错误消息中看到的那样:Error processing {'link': [u'http://www.meetup.com/Search-Meetup-Karlsruhe/events/221907250/'], 'title': [u'Alles rund um Elasticsearch']}
您的商品link
和title
字段是列表(值周围的方括号表示这一点)。
这是因为你在Scrapy中提取。您没有将其与您的问题一起发布,但您应该使用response.xpath().extract()[0]
来获取列表的第一个结果。当然,在这种情况下,您应该准备遇到空结果集以避免索引错误。
<强>更新强>
对于您不提取任何内容的情况,您可以使用以下内容进行准备:
linkSelection = response.xpath().extract()
item['link'] = linkSelection[0] if linkSelection else ""
或类似的东西,具体取决于您的数据和字段。如果列表为空,None
也可能有效。
基本思想是分割XPath提取和列表项选择。如果项目包含所需的元素,则应从列表中选择一个项目。