未找到Scrapy CrawlSpider属性

时间:2017-07-31 20:49:55

标签: python scrapy

使用Scrapy 1.4.0和我在网上找到的修改后的模板,我收到以下错误:

  

属性错误:模块' scrapy'没有属性' CrawlSpider'

日志似乎没有显示任何其他感兴趣的内容。

代码:

import scrapy
from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor

class TechcrunchSpider(scrapy.CrawlSpider):
    #name of the spider
    name = 'stltoday'

    #list of allowed domains
    allowed_domains = ['http://graphics.stltoday.com/']

    #starting url for scraping
    start_urls = ['http://graphics.stltoday.com/apps/payrolls/salaries/2_1/']

    rules = [
    Rule(LinkExtractor(
        allow=['/apps/payrolls/salaries/.*/$']),
        callback='parse',
        follow=True),
    ]

    #setting the location of the output csv file
    custom_settings = {
        'FEED_URI' : 'tmp/stltoday.csv'
    }

    def parse(self, response):
        #Remove XML namespaces
        response.selector.remove_namespaces()

        #Extract article information
        name = response.xpath("//th::text").extract()
        allother = response.xpath('//table[@class="table--department"]//td').extract()


        for item in zip(name,allother):
            scraped_info = {
                'name' : item[0],
                'allother' : item[1]
            }

            yield scraped_info

1 个答案:

答案 0 :(得分:1)

该错误意味着模块CrawlSpider不包含class TechcrunchSpider(scrapy.CrawlSpider): 类。正如评论中所述,这是scrapy文档发生变化的结果。快速修复应该是改变

class TechcrunchSpider(scrapy.spiders.CrawlSpider):

In [77]: df.T.apply(lambda x: x.dropna().tolist()).tolist()
Out[77]: [['NY', 'WA', 'AZ'], ['DC'], ['MA', 'CA']]

应解决问题!