我已经编写了这段简短的蜘蛛代码,以从黑客新闻首页(http://news.ycombinator.com/)中提取标题。
import scrapy
class HackerItem(scrapy.Item): #declaring the item
hackertitle = scrapy.Field()
class HackerSpider(scrapy.Spider):
name = 'hackernewscrawler'
allowed_domains = ['news.ycombinator.com'] # website we chose
start_urls = ['http://news.ycombinator.com/']
def parse(self,response):
sel = scrapy.Selector(response) #selector to help us extract the titles
item=HackerItem() #the item declared up
# xpath of the titles
item['hackertitle'] =
sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()
# printing titles using print statement.
print (item['hackertitle']
但是,当我运行代码scrapy scrawl hackernewscrawler -o hntitles.json -t json
我得到一个空的.json文件,其中没有任何内容。
答案 0 :(得分:0)
您应将conversation
语句更改为wait()
:
conversationsList?
然后运行:
print