Scrapy:所有刮刀都失败了。蜘蛛语法错误

时间:2018-01-12 09:15:17

标签: python web-scraping scrapy syntax-error

当一个刮刀存在一些错误时,有时我会失败所有刮刀。 例: 我有错误的语法错误的刮刀。

Method: Naive
Frequencies [value: observed, desired]
1st column:  0: 0.449330 0.400000  1: 0.334180 0.350000  2: 0.216490 0.250000
2nd column:  0: 0.349050 0.400000  1: 0.366640 0.350000  2: 0.284310 0.250000
1st or 2nd:  0: 0.798380 0.800000  1: 0.700820 0.700000  2: 0.500800 0.500000

2nd column conditioned on 1st column [value 1st: val / prob 2nd]
0: 0 / 0.000000 1 / 0.608128 2 / 0.391872
1: 0 / 0.676133 1 / 0.000000 2 / 0.323867
2: 0 / 0.568617 1 / 0.431383 2 / 0.000000

Method: naive, optimized
Frequencies [value: observed, desired]
1st column:  0: 0.450606 0.400000  1: 0.334881 0.350000  2: 0.214513 0.250000
2nd column:  0: 0.349624 0.400000  1: 0.365469 0.350000  2: 0.284907 0.250000
1st or 2nd:  0: 0.800230 0.800000  1: 0.700350 0.700000  2: 0.499420 0.500000

2nd column conditioned on 1st column [value 1st: val / prob 2nd]
0: 0 / 0.000000 1 / 0.608132 2 / 0.391868
1: 0 / 0.676515 1 / 0.000000 2 / 0.323485
2: 0 / 0.573727 1 / 0.426273 2 / 0.000000

Method: Timeless
Frequencies [value: observed, desired]
1st column:  0: 0.400756 0.400000  1: 0.349099 0.350000  2: 0.250145 0.250000
2nd column:  0: 0.399128 0.400000  1: 0.351298 0.350000  2: 0.249574 0.250000
1st or 2nd:  0: 0.799884 0.800000  1: 0.700397 0.700000  2: 0.499719 0.500000

2nd column conditioned on 1st column [value 1st: val / prob 2nd]
0: 0 / 0.000000 1 / 0.625747 2 / 0.374253
1: 0 / 0.714723 1 / 0.000000 2 / 0.285277
2: 0 / 0.598129 1 / 0.401871 2 / 0.000000

这个蜘蛛错过了逗号

class MySpiderWithSyntaxError(scrapy.Spider):
    name = "my_spider_with_syntax_error"

    start_urls = [
        'http://www.website.com'
    ]

    def parse(self response):
        for url in response.css('a.p::attr(href)').extract():
            print url

蜘蛛MySpiderWithSyntaxError将失败。 但如果运行另一个没有语法错误的蜘蛛(下面的蜘蛛代码)

def parse(self response):

我得到的错误是这样的:

class MySpiderWithoutSyntaxError(scrapy.Spider):
    name = "my_spider_without_syntax_error"

    start_urls = [
        'http://www.website.com'
    ]

    def parse(self, response):
        for url in response.css('a.p::attr(href)').extract():
            print url

问题: 是否有可能捕获这样的错误并且只有语法错误的蜘蛛失败但另一个蜘蛛工作正常?

1 个答案:

答案 0 :(得分:1)

如果您使用Scrapy项目,那么即使您运行单个蜘蛛(使用scrapy crawl <spidername>),也会加载所有蜘蛛模块。因此,如果它们中的任何一个包含语法错误,则会出错。