Question

您好我正在尝试从scrapy构建一个简单的爬虫。

代码在scrapy shell中工作正常，但是当我通过控制台运行它时，它不会向json文件写入任何内容。

我正在从项目顶层目录中运行它

scrapy crawl filemare -o filemare.json


import scrapy


class FilemareSpider(scrapy.Spider):
    name = "filemare"
    allowed_domains = ['https://filemare.com/']
    start_urls = ["https://filemare.com/en-
                   us/search/firmware%20download/632913359"]

    def parse(self, response):
        items = response.xpath('//div[@class="f"]/text()').extract()
        #items = response.css('div.f::text').extract()

        for url in items:
            print(url)
            yield url

Answer 1

parse方法必须返回dict，Scrapy Item或Request对象（请参阅documentation）。在您的情况下，您产生一个字符串。如果你运行蜘蛛，你会在输出中看到错误。

更改代码的相应部分，如下所示：

...
def parse(self, response):
    items = response.xpath('//div[@class="f"]/text()').extract()

    for url in items:
        print(url)
        yield {'url': url}

Scrapy Crawler在shell中工作但不在代码中工作

1 个答案: