scrapy xpath无法获得价值

时间:2016-10-05 08:10:03

标签: python xpath scrapy scrapy-spider

我有一个网站,我想保存两个span元素值。

这是我的HTML代码的相关部分:

<div class="box-search-product-filter-row">

    <span class="result-numbers" sth-bind="model.navigationSettings.showFilter">

    <span class="number" sth-bind="span1"></span>

    <span class="result" sth-bind="span2"></span>

    </span>

</div>

我创造了一只蜘蛛:

from scrapy.spiders import Spider
from scrapy.selector import Selector

class MySpdier(Spider):

    name = "list"
    allowed_domains = ["example.com"]
    start_urls = [
        "https://www.example.com"]

    def parse(self, response):
        sel = Selector(response)
        divs = sel.xpath("//div[@class='box-search-product-filter-row']")


        for div in divs:
            sth = div.xpath("/span[class='result']/text()").extract()

            print sth

当我爬行蜘蛛时,它只会打印出来:

[]

有人可以帮助我如何从我的两个(类号和类结果)span元素中获取值?

2 个答案:

答案 0 :(得分:1)

您在xpath @中忘记了"/span[class='result']/text()"。此外,您所寻找的范围不是一级孩子,因此您需要使用.//代替/。看到: enter image description here 资料来源:http://www.w3schools.com/xsl/xpath_syntax.asp

完整且正确的xpath将是:".//span[@class='result']" +&#39; / text()&#39;如果你只想选择文本,但你的例子中的节点没有文字,所以它不会在这里工作。

答案 1 :(得分:0)

这对你有用

修改

from scrapy.spiders import Spider
from scrapy.selector import Selector

class MySpdier(Spider):

    name = "list"
    allowed_domains = ["example.com"]
    start_urls = [
        "https://www.example.com"]

    def parse(self, response):
        sel = Selector(response)
        divs = sel.xpath("//div[@class='box-search-product-filter-row']")    

        for div in divs:
            sth = div.xpath(".//span[@class='result']/text()").extract()    
            print sth