我写了一个简单的蜘蛛来搜索网站上的详细信息。当我在控制台上运行它时我得到了输出,但如果我使用-o filename.json
将它放入文件中,它只是在文件中给我一个[
。我该怎么办?
我的蜘蛛看起来像
import scrapy
from tutorial.items import TutorialItem
class ChillumSpider(scrapy.Spider):
name = "chillum"
allowed_domains = ["flipkart.com"]
start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]
def parse(self, response):
title=response.xpath('//a[@class="fk-display-block"]/text()').extract()
print title
我在控制台上的输出看起来像
[u" \ n助手JKT8810全袖自我设计男士",你' ',你" \ n Justanned Full Sleeve Solid Men's Bomber",u' ',你" \ n Pepe Sleeveless Solid Men",u' ',你" \ n白金工作室无袖固体男士尼赫鲁 ",你' ',你" \ n Yepme Sleevele ss Solid Men",u' ',你' \ n爱皮革 ',你" Full Sleeve Solid Men的Puleather Ja ... \ n",u" \ n Justanned Full Sleeve Solid Men的轰炸机",你' ',u" \ n Oceanic Full Sleeve Self 设计男士",你' ',你" \ n Dooda Full Sleeve Solid Men",u' &#39 ;, u" \ n裸露的皮肤全袖自我设计男士",u' ',你" \ n Asst Full Sleeve Solid Women",u' ',你" \ n机车男士袖子",你' ',你" \ n Justanned Full Sleeve Solid Women" s,u' ',你' &#39 ;, 你" \ n牧马人无袖男士",你' ',你" \ n TSX无袖 Solid Men的轰炸机",你' ']
但是当我scrapy crawl spider_name -o filename.json
时,我没有在文件中获得相同的输出。
答案 0 :(得分:0)
这是因为您需要返回Item
个实例:
import scrapy
from tutorial.items import TutorialItem
class ChillumSpider(scrapy.Spider):
name = "chillum"
allowed_domains = ["flipkart.com"]
start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]
def parse(self, response):
titles = response.xpath('//a[@class="fk-display-block"]/text()').extract()
for title in titles:
item = TutorialItem()
item['title'] = title
yield item