Question

的xpath：

//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()

HTML：

<ol class="breadcrumb container">
    <li class="first"><a href="http://example.com/index.php?route=common/home"><span>Home</span></a></li>
    <li><a href="http://example.com/books"><span>Books</span></a></li>
    <li class="last"><a href="http://example.com/books?product_id=193" class="last"><span>My Vision : Challenges in the Race for Excellence - Mohammed Bin Rashid Al Maktoum</span></a></li>
</ol>

Python代码：

categories = ['NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA']
catIndex = 0
for cat in sel.xpath('//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()').extract():
            categories[catIndex] = cat
            catIndex += 1

通缉的结果是“书籍”，当我在Firebug控制台上用xpath检查它时，它返回正确的结果但是当我运行spider时它返回整个3个Li元素，不包括class =“first”和class =“last”

我尝试命令Scrapy View http://example.com查看蜘蛛如何看到它，但一切看起来都相同，xpath返回正确的结果

当我尝试在Scrapy Shell中使用xpath时，它会返回所有3个Li元素的错误结果

可能是什么问题？

Answer 1

在Internet Explorer中打开Scrapy查看http://example.com输出，看到Li元素中没有Class属性。

表示在Chrome或Firefox中打开的Scrapy View Command不会显示蜘蛛看到的REALL代码。

与浏览器控制台xpath结果

1 个答案: