scrapy Key Error标题

时间:2018-03-01 13:17:46

标签: python scrapy web-crawler scrapy-spider reddit

我是python的新手,尤其是scrapy。我想制作一个蜘蛛,它给我一个reddit页面的所有评论。它找到了注释但它没有将它们保存到.csv文件中。这是我的蜘蛛:

     import scrapy
     from scrapy.spiders import CrawlSpider, Rule
     from scrapy.loader import ItemLoader
     from reddit.items import RedditItem


     class TestSpider(CrawlSpider):
        name="test"
        allowed_domains = ["www.reddit.com"]
        start_urls =        ['https://www.reddit.com/r/FIFA/comments/7pulch/introduction_community_update/']

        def parse(self, response):

        selector_list = response.xpath('//div[contains(@data-type, "comment")]')

        for selector in selector_list:
            item = RedditItem()
            item['comment_text'] = selector.xpath('.//div[contains(@class, "usertext-body may-blank-within md-container ")]/div').extract()
            item['comment_author'] = selector.xpath('./@data-author').extract()
            item['comment_id'] = selector.xpath('./@id').extract()


            yield item

这是我在每一步中遇到的错误的一个例子:

    2018-03-01 13:10:23 [scrapy.core.scraper] ERROR: Error processing 
   {'comment_author': [u'Vision322'],
   'comment_id': [u'thing_t1_dsk7a5t'],
   'comment_text': [u'<div class="md"><p>hello and welcome!\nhow are     you?   </p>\n</div>']}
    Traceback (most recent call last):
    File       "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-  packages/twisted/internet/defer.py", line 653, in _runCallbacks
      current.result = callback(current.result, *args, **kw)
    File "/Users/Torben/reddit/reddit/pipelines.py", line 11, in        process_item
    item['title'] = ''.join(item['title']).upper()
    File     "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-   packages/scrapy/item.py", line 59, in __getitem__
    return self._values[key]
    KeyError: 'title'

谁能告诉我这是什么问题?

0 个答案:

没有答案