我是python的新手,尤其是scrapy。我想制作一个蜘蛛,它给我一个reddit页面的所有评论。它找到了注释但它没有将它们保存到.csv文件中。这是我的蜘蛛:
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.loader import ItemLoader
from reddit.items import RedditItem
class TestSpider(CrawlSpider):
name="test"
allowed_domains = ["www.reddit.com"]
start_urls = ['https://www.reddit.com/r/FIFA/comments/7pulch/introduction_community_update/']
def parse(self, response):
selector_list = response.xpath('//div[contains(@data-type, "comment")]')
for selector in selector_list:
item = RedditItem()
item['comment_text'] = selector.xpath('.//div[contains(@class, "usertext-body may-blank-within md-container ")]/div').extract()
item['comment_author'] = selector.xpath('./@data-author').extract()
item['comment_id'] = selector.xpath('./@id').extract()
yield item
这是我在每一步中遇到的错误的一个例子:
2018-03-01 13:10:23 [scrapy.core.scraper] ERROR: Error processing
{'comment_author': [u'Vision322'],
'comment_id': [u'thing_t1_dsk7a5t'],
'comment_text': [u'<div class="md"><p>hello and welcome!\nhow are you? </p>\n</div>']}
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/Torben/reddit/reddit/pipelines.py", line 11, in process_item
item['title'] = ''.join(item['title']).upper()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/scrapy/item.py", line 59, in __getitem__
return self._values[key]
KeyError: 'title'
谁能告诉我这是什么问题?