我打算使用scrapy从espncricnfo网站上删除评论,我输出(items.csv)为空白。这些是我的文件。
cricinfo.py(蜘蛛文件)
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from crictest.items import CrictestItem
class MySpider(BaseSpider):
name = "cricinfo"
allowed_domains = ["espncricinfo.com/"]
start_urls = ["http://www.espncricinfo.com/champions-league-twenty20-2014/engine/match/763595.html?innings=1;view=commentary/"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
rows = hxs.select('//td[@class="battingComms" and b]')
for row in rows:
item = CrictestItem()
item['overnum'] = row.select('b/text()').extract()[0]
item['overnumtext'] = row.select('b/following-sibling::text()').extract()[0]
yield item
items.py
import scrapy
class CrictestItem(scrapy.Item):
overnum = scrapy.Field()
overnumtext = scrapy.Field()
答案 0 :(得分:0)
问题是你的xpath
你可以尝试在chrome中使用它: $ X( '// * [@ ID = “commInnings”] / DIV [2] / DIV / DIV')
在您的代码中重写代码: rows = hxs.select('// td [@ class =“battingComms”和b]') 我无法在控制台中获得任何输出