Question

我想使用XPath提取网页中的数据，但一无所获，如何提取数据？

我尝试使用下面的代码，但它们什么也不返回。

我尝试使用

house.xpath('.//span[@class = "icon icon-pin"]/text()').extract_first()

and

house.xpath('.//span[@class = "ann info-item"]/text()').extract_first()

但是我什么也没得到。

这是我要提取的代码：

<span class = "ann-info-item">
     <span class = "icon icon-pin">
         ::before
       </span>
       " San Jorge "
      </span>

我想提取“ San Jorge”，但一无所获。

Answer 1

您应该选择内部span，然后接受以下文本，因此表达式将类似于house.xpath('.//span[@class="icon icon-pin"]/following-sibling::text()').get()

在shell中，我可以通过这种方式获取数据：

>>> from scrapy import Selector
>>> txt = """<span class = "ann-info-item">
...      <span class = "icon icon-pin">
...          ::before
...        </span>
...        " San Jorge "
...       </span>"""
>>> sel = Selector(text=txt)
>>> sel.xpath('//span[@class="icon icon-pin"]/following-    sibling::text()').get()
u'\n       " San Jorge "\n      '
>>> sel.xpath('//span[@class="icon icon-pin"]/following-sibling::text()').get().strip()
u'" San Jorge "'

Answer 2

尝试以下操作：

.//span[@class = "ann-info-item"]/text()[2]

似乎您在第二个xpath查询中添加了“-”。另外，amn-info-item分类的span具有两个文本节点。使用[2]将获得第二个。

使用XPath提取数据

2 个答案: