Question

我写了一个简单的蜘蛛来搜索网站上的详细信息。当我在控制台上运行它时我得到了输出，但如果我使用-o filename.json将它放入文件中，它只是在文件中给我一个[。我该怎么办？

我的蜘蛛看起来像

import scrapy
from tutorial.items import TutorialItem

class ChillumSpider(scrapy.Spider):
name = "chillum"
allowed_domains = ["flipkart.com"]
start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]

def parse(self, response):
    title=response.xpath('//a[@class="fk-display-block"]/text()').extract()
    print title

我在控制台上的输出看起来像

[u＆＃34; \ n助手JKT8810全袖自我设计男士＆＃34;，你＆＃39; ＆＃39;，你＆＃34; \ n Justanned Full Sleeve Solid Men's Bomber＆＃34;，u＆＃39; ＆＃39;，你＆＃34; \ n Pepe Sleeveless Solid Men＆＃34;，u＆＃39; ＆＃39;，你＆＃34; \ n白金工作室无袖固体男士尼赫鲁＆＃34;，你＆＃39; ＆＃39;，你＆＃34; \ n Yepme Sleevele ss Solid Men＆＃34;，u＆＃39; ＆＃39;，你＆＃39; \ n爱皮革＆＃39;，你＆＃34; Full Sleeve Solid Men的Puleather Ja ... \ n＆＃34;，u＆＃34; \ n Justanned Full Sleeve Solid Men的轰炸机＆＃34;，你＆＃39; ＆＃39;，u＆＃34; \ n Oceanic Full Sleeve Self 设计男士＆＃34;，你＆＃39; ＆＃39;，你＆＃34; \ n Dooda Full Sleeve Solid Men＆＃34;，u＆＃39; ＆＃39 ;, u＆＃34; \ n裸露的皮肤全袖自我设计男士＆＃34;，u＆＃39; ＆＃39;，你＆＃34; \ n Asst Full Sleeve Solid Women＆＃34;，u＆＃39; ＆＃39;，你＆＃34; \ n机车男士袖子＆＃34;，你＆＃39; ＆＃39;，你＆＃34; \ n Justanned Full Sleeve Solid Women＆＃34; s，u＆＃39; ＆＃39;，你＆＃39; ＆＃39 ;, 你＆＃34; \ n牧马人无袖男士＆＃34;，你＆＃39; ＆＃39;，你＆＃34; \ n TSX无袖 Solid Men的轰炸机＆＃34;，你＆＃39; ＆＃39;]

但是当我scrapy crawl spider_name -o filename.json时，我没有在文件中获得相同的输出。

Answer 1

这是因为您需要返回Item个实例：

import scrapy
from tutorial.items import TutorialItem

class ChillumSpider(scrapy.Spider):
    name = "chillum"
    allowed_domains = ["flipkart.com"]
    start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]

    def parse(self, response):
        titles = response.xpath('//a[@class="fk-display-block"]/text()').extract()
        for title in titles:
            item = TutorialItem()
            item['title'] = title
            yield item

刮取结果导出问题

1 个答案: