使用scrapy刮擦时没有输出

时间:2017-01-31 13:28:33

标签: python scrapy-spider

我打算使用scrapy从espncricnfo网站上删除评论,我输出(items.csv)为空白。这些是我的文件。

cricinfo.py(蜘蛛文件)

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from crictest.items import CrictestItem


class MySpider(BaseSpider):
    name = "cricinfo"
    allowed_domains = ["espncricinfo.com/"]
    start_urls = ["http://www.espncricinfo.com/champions-league-twenty20-2014/engine/match/763595.html?innings=1;view=commentary/"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        rows = hxs.select('//td[@class="battingComms" and b]')
        for row in rows:
            item = CrictestItem()
            item['overnum'] = row.select('b/text()').extract()[0]
            item['overnumtext'] = row.select('b/following-sibling::text()').extract()[0]
            yield item

items.py

 import scrapy

    class CrictestItem(scrapy.Item):
        overnum = scrapy.Field()
        overnumtext = scrapy.Field()

1 个答案:

答案 0 :(得分:0)

问题是你的xpath

你可以尝试在chrome中使用它: $ X( '// * [@ ID = “commInnings”] / DIV [2] / DIV / DIV')

在您的代码中

重写代码: rows = hxs.select('// td [@ class =“battingComms”和b]') 我无法在控制台中获得任何输出