Question

我在HTML文档中有以下内容：

<div class="prompt input_prompt xh-highlight">
   <bdi class="">In</bdi>
   "&nbsp;[&nbsp;]:"
</div>

为了找到这种情况（我的意思是表达式[ ]:），我尝试了以下想法，但没有一个起作用：

//div/bdi/parent::*/text()="&nbsp;[&nbsp;]:"
//div/bdi/parent::*[contains(text(), " ")]
//div/bdi/parent::*[contains(text(), "&nbsp;")]
//div[contains(text(), "&nbsp;[&nbsp;]:")]
//div[contains(text(), "[ ]")]
//div[contains(text(), "[&nbsp;]")]
//div[contains(text(), "\u00a0]:")]

如何正确处理？

Answer 1

在XPath本身中，在字符串文字中表示字符xA0的唯一方法是本身。因此您可以搜索Filter(function(x) any(grepl(x, s2)), s1) ## [1] "ab" "cd-e"，其中§是字符xA0。当然，这样做的缺点是，对于您的读者来说，所讨论的字符不是xA0而不是普通空格，这对读者来说并不明显。

XPath通常嵌入在宿主语言中，宿主语言很可能提供了编写此字符的另一种方法。例如，如果宿主语言是基于XML的语言（例如XSLT），则可以将其编写为//div[contains(., "[§]")]，而如果是Javascript，则可以将其编写为 。

因此，编写表达式的方式取决于您的宿主语言约定。

Answer 2

This works properly:

You search for all the elements that match the following xpath expression:

//div/bdi/parent::*[contains(text(), "]:")]

Then you loop over them and get their text, which you can easily compare in a logic and proper language, like python:

for element in elements:
    if '[ ]' in element.text:
        # some code

如何使用XPATH在HTM1文档中查找元素？

2 个答案: