空的.json文件

时间:2018-12-02 17:52:52

标签: scrapy web-crawler

我已经编写了这段简短的蜘蛛代码,以从黑客新闻首页(http://news.ycombinator.com/)中提取标题。

import scrapy

class HackerItem(scrapy.Item): #declaring the item
    hackertitle = scrapy.Field()


class HackerSpider(scrapy.Spider):
    name = 'hackernewscrawler'
    allowed_domains = ['news.ycombinator.com'] # website we chose
    start_urls = ['http://news.ycombinator.com/']

   def parse(self,response):
        sel = scrapy.Selector(response) #selector to help us extract the titles
        item=HackerItem() #the item declared up

# xpath of the titles
        item['hackertitle'] = 
sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()


# printing titles using print statement.
        print (item['hackertitle']

但是,当我运行代码scrapy scrawl hackernewscrawler -o hntitles.json -t json

我得到一个空的.json文件,其中没有任何内容。

1 个答案:

答案 0 :(得分:0)

您应将conversation语句更改为wait()

conversationsList?

然后运行:

print