的xpath:
//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()
HTML:
<ol class="breadcrumb container">
<li class="first"><a href="http://example.com/index.php?route=common/home"><span>Home</span></a></li>
<li><a href="http://example.com/books"><span>Books</span></a></li>
<li class="last"><a href="http://example.com/books?product_id=193" class="last"><span>My Vision : Challenges in the Race for Excellence - Mohammed Bin Rashid Al Maktoum</span></a></li>
</ol>
Python代码:
categories = ['NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA']
catIndex = 0
for cat in sel.xpath('//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()').extract():
categories[catIndex] = cat
catIndex += 1
通缉的结果是“书籍”,当我在Firebug控制台上用xpath检查它时,它返回正确的结果但是当我运行spider时它返回整个3个Li元素,不包括class =“first”和class =“last”
我尝试命令Scrapy View http://example.com查看蜘蛛如何看到它,但一切看起来都相同,xpath返回正确的结果
当我尝试在Scrapy Shell中使用xpath时,它会返回所有3个Li元素的错误结果
可能是什么问题?
答案 0 :(得分:0)
在Internet Explorer中打开Scrapy查看http://example.com输出,看到Li元素中没有Class属性。
表示在Chrome或Firefox中打开的Scrapy View Command不会显示蜘蛛看到的REALL代码。