刮取结果导出问题

时间:2015-02-26 08:41:44

标签: python web-crawler scrapy

我写了一个简单的蜘蛛来搜索网站上的详细信息。当我在控制台上运行它时我得到了输出,但如果我使用-o filename.json将它放入文件中,它只是在文件中给我一个[。我该怎么办?

我的蜘蛛看起来像

import scrapy
from tutorial.items import TutorialItem

class ChillumSpider(scrapy.Spider):
name = "chillum"
allowed_domains = ["flipkart.com"]
start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]

def parse(self, response):
    title=response.xpath('//a[@class="fk-display-block"]/text()').extract()
    print title

我在控制台上的输出看起来像

  

[u" \ n助手JKT8810全袖自我设计男士",你' ',你" \ n   Justanned Full Sleeve Solid Men's Bomber",u' ',你" \ n Pepe Sleeveless   Solid Men",u' ',你" \ n白金工作室无袖固体男士尼赫鲁   ",你' ',你" \ n Yepme Sleevele ss Solid Men",u' ',你' \ n爱皮革   ',你" Full Sleeve Solid Men的Puleather Ja ... \ n",u" \ n Justanned Full   Sleeve Solid Men的轰炸机",你' ',u" \ n Oceanic Full Sleeve Self   设计男士",你' ',你" \ n Dooda Full Sleeve Solid Men",u' &#39 ;,   u" \ n裸露的皮肤全袖自我设计男士",u' ',你" \ n Asst Full   Sleeve Solid Women",u' ',你" \ n机车男士袖子",你'   ',你" \ n Justanned Full Sleeve Solid Women" s,u' ',你' &#39 ;,   你" \ n牧马人无袖男士",你' ',你" \ n TSX无袖   Solid Men的轰炸机",你' ']

但是当我scrapy crawl spider_name -o filename.json时,我没有在文件中获得相同的输出。

1 个答案:

答案 0 :(得分:0)

这是因为您需要返回Item个实例:

import scrapy
from tutorial.items import TutorialItem

class ChillumSpider(scrapy.Spider):
    name = "chillum"
    allowed_domains = ["flipkart.com"]
    start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]

    def parse(self, response):
        titles = response.xpath('//a[@class="fk-display-block"]/text()').extract()
        for title in titles:
            item = TutorialItem()
            item['title'] = title
            yield item