Question

当我使用contains搜索元素的text（）中数据的存在时，它适用于普通数据，但是当元素内容中有回车符，新行/标记时则不行。如何让//td[contains(text(), "")]在这种情况下工作？谢谢！

XML：

<table>
  <tr>
    <td>
      Hello world <i> how are you? </i>
      Have a wonderful day.
      Good bye!
    </td>
  </tr>
  <tr>
    <td>
      Hello NJ <i>, how are you?
      Have a wonderful day.</i>
    </td>
  </tr>
</table>

Python：

>>> tdout=open('tdmultiplelines.htm', 'r')
>>> tdouthtml=lh.parse(tdout)
>>> tdout.close()
>>> tdouthtml
<lxml.etree._ElementTree object at 0x2aaae0024368>
>>> tdouthtml.xpath('//td/text()')
['\n      Hello world ', '\n      Have a wonderful day.\n      Good bye!\n    ', '\n      Hello NJ ', '\n    ']
>>> tdouthtml.xpath('//td[contains(text(),"Good bye")]')
[]  ##-> But *Good bye* is already in the `td` contents, though as a list.
>>> tdouthtml.xpath('//td[text() = "\n      Hello world "]')
[<Element td at 0x2aaae005c410>]

Answer 1

使用：

//td[text()[contains(.,'Good bye')]]

<强>解释：

问题的原因并不是文本节点的字符串值是多行字符串 - 真正的原因是td元素有多个文本节点子元素。

在提供的表达式中：

//td[contains(text(),"Good bye")]

传递给函数contains()的第一个参数是一个包含多个文本节点的节点集。

根据XPath 1.0规范（在XPath 2.0中，这只会引发一个类型错误），一个对需要字符串参数但是传递给一个节点集的函数的评估，仅获取字符串值节点集中的第一个节点。

在这种特定情况下，传递的节点集的第一个文本节点具有字符串值：

" Hello world "

因此比较失败，未选择所需的td元素。

基于XSLT的验证：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/"> <xsl:copy-of select="//td[text()[contains(.,'Good bye')]]"/> </xsl:template> </xsl:stylesheet>

在提供的XML文档上应用此转换时：

<table> <tr> <td> Hello world <i> how are you? </i> Have a wonderful day. Good bye! </td> </tr> <tr> <td> Hello NJ <i>, how are you? Have a wonderful day.</i> </td> </tr> </table>

评估XPath表达式，并将选定的节点（在这种情况下只有一个）复制到输出：

<td> Hello world <i> how are you? </i> Have a wonderful day. Good bye! </td>

Answer 2

使用.代替text()：

tdouthtml.xpath('//td[contains(.,"Good bye")]')

如何使用Python在多行文本中搜索XPath中的内容？

2 个答案: