Question

这是我的第一个问题。我试图通过scrapy获取网页数据。

<dl class="pairing">
     <dt class="attribute" title="Maridaje">Maridaje:</dt>
     <dd>
</dl>
<dl>
<dl>
     <dt class="attribute" title="Vol. de alcohol">Vol. De Alcohol:</dt>
     <dd>14%</dd>
</dl>

如您所见，有些实例使用相同的类名。我只想把文字合二为一。我如何指定我所指的是哪一个？

我试过了

item['maridaje'] = response.xpath('.//*[@class="attribute"]/text()').extract()

但这只给了我同名的所有歌词的标题。

非常感谢！

Answer 1

多个选项：

by XPath中的索引（基于1）：.//*[@class="attribute"][1]/text()

如果所需的是第一个元素，请使用extract_first()：

response.xpath('.//*[@class="attribute"]/text()').extract_first()

通过Python中的索引（基于0），得到第二个匹配：

response.xpath('.//*[@class="attribute"]/text()').extract()[1]

检查父级：.//dl[@class="pairing"]/dt[@class="attribute"]/text()
检查title属性：.//*[@class="attribute" and @title="Maridaje"]/text()

如何获得具有相同名称的正确xpath？ Scrapy

1 个答案: