Question

我将此作为要解析的节点：

<h3 class="atag">
    <a href="http://www.example.com">
      <span class="btag">text to be ignored</span>
         </a>
           <span class="ctag">text to be checked</span>
</h3>

我需要提取“http://www.example.com”而不是要忽略的部分文本;我还需要检查ctag是否包含要检查的文本。

我想出了这个，但似乎没有做到这一点。

response.xpath("//h3/a/@*[not(self::span)]").extract()

对此有何看法？

Answer 1

如果您只需从'a'标签中选择href，请使用@href。还要检查ctag是否包含一些文本，我想你可以使用这样的代码：

'//h3[contains(span[@class="ctag"]/text(), "text to be checked")]/a/@href'

这将检查给定h3块内是否存在“要检查的文本”的跨度。如果文本存在，则会找到“www.example.com”，否则会出现空的结果。

Answer 2

你是说像这样的XPath？：

//h3/a[following-sibling::span[@class='ctag' and .='text to be checked']/@href

上面的XPath获取<a>标记后跟<span class="ctag">包含"text to be checked"的值，然后从前面提到的href标记返回<a>属性。

XPATH检查节点中的特定文本

2 个答案: