与浏览器控制台xpath结果

时间:2015-08-29 11:40:15

标签: python xpath scrapy

的xpath:

//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()

HTML:

<ol class="breadcrumb container">
    <li class="first"><a href="http://example.com/index.php?route=common/home"><span>Home</span></a></li>
    <li><a href="http://example.com/books"><span>Books</span></a></li>
    <li class="last"><a href="http://example.com/books?product_id=193" class="last"><span>My Vision : Challenges in the Race for Excellence - Mohammed Bin Rashid Al Maktoum</span></a></li>
</ol>

Python代码:

categories = ['NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA']
catIndex = 0
for cat in sel.xpath('//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()').extract():
            categories[catIndex] = cat
            catIndex += 1

通缉的结果是“书籍”,当我在Firebug控制台上用xpath检查它时,它返回正确的结果但是当我运行spider时它返回整个3个Li元素,不包括class =“first”和class =“last”

我尝试命令Scrapy View http://example.com查看蜘蛛如何看到它,但一切看起来都相同,xpath返回正确的结果

Scrapy view

当我尝试在Scrapy Shell中使用xpath时,它会返回所有3个Li元素的错误结果

Scrapy Shell

可能是什么问题?

1 个答案:

答案 0 :(得分:0)

在Internet Explorer中打开Scrapy查看http://example.com输出,看到Li元素中没有Class属性。

表示在Chrome或Firefox中打开的Scrapy View Command不会显示蜘蛛看到的REALL代码。