我对使用Xpath非常陌生。我正在尝试从法律法规网站上提取一些信息,现在我只想:
<b>
标签中包含另一个字符串“ PRIME Ministry”。<p>
<b> <span> Article 1. </span> </b>
<span>
To approve the master plan on development
of tourism in Northern Central Vietnam
with the following principal contents:
</span>
</p>
<p>
<span>
1. Development viewpoints
</span>
</p>
<p>
<span>To realize general viewpoints of the strategy for and master plan on development of Vietnam’s tourism through 2020.
</span>
</p>
<p>
<span>PRIME MINISTER: Nguyen Tan Dung</span>
</p>
<p>
<span>
<b> PRIME MINISTER </b>
</span>
</p>
<p>
<b> <span> Article 2. </span> </b>
<span>
.................
</span>
</p>
<p>
<span> PRIME MINISTER: Nguyen Tan Dung</span>
</p>
预期的输出,我应该有一个类似于
的列表[
'Article 1.' ,
'To approve the master plan on development of tourism in Northern
Central Vietnam with the following principal contents: ',
'1. Development viewpoints' ,
'To realize general viewpoints of the strategy for and master plan on
development of Vietnam’s tourism through 2020.' ,
'PRIME MINISTER: Nguyen Tan Dung',
'PRIME MINISTER'
]
列表中的第一项是“第1条”。并且列表中的最后一项是<b>
标签内的“ PRIME MINISTER”
答案 0 :(得分:3)
即使在XPath版本高于1.0的情况下,在XPath中“ Until”和“ Between”查询也非常困难。
如果我们从更高版本开始工作,则可以在XPath 3.1中执行以下操作:
let $first := p[contains(., 'Article 1')],
$last := p[contains(., 'PRIME MINISTER']
return $first, p[. >> $first and . << $last], $last
在XPath 2.0中,我们没有let
,但是for
的效果也很好,只是有点奇怪。
但是在1.0中(a)我们不能绑定变量,并且(b)我们没有<<
和>>
运算符,这使它变得更加困难。
最简单的表达可能是
p[(.|preceding-sibling::p)[contains(., 'Article 1')] and
(.|following-sibling::p)[contains(., 'PRIME MINISTER')]]
不幸的是,如果没有令人难以置信的智能优化器,那么对于大型输入文档而言,效率可能非常低(contains()测试将被执行(N ^ 2)/ 2次,其中N是段数)。如果您受限于XPath 1.0,那么最好使用XPath查找“开始”和“结束”节点,然后使用宿主语言查找介于两者之间的所有节点。
答案 1 :(得分:0)
此xpath表达式:
//p[descendant-or-self::p and (following-sibling::p/descendant::b)]
至少应该在您发布的html代码上获得预期的输出。
答案 2 :(得分:0)
这是与OP中确切要求匹配的xpath。
//span[normalize-space(.)='Article 1.']/ancestor::p|//p[//span[normalize-space(.)='Article 1.']]/following::*[count(following-sibling::p/span/b[normalize-space(.)='PRIME MINISTER'])=1]
屏幕截图:
答案 3 :(得分:0)
一个简单的XPath 1.0表达式:
/*/p[starts-with(normalize-space(), 'Article 1.')]
[1]
| /*/p[starts-with(normalize-space(), 'Article 1.')]
[1]/following-sibling::p
[not(preceding-sibling::p[starts-with(normalize-space(), 'PRIME MINISTER')])
and
following-sibling::p[starts-with(normalize-space(), 'PRIME MINISTER')]
and not(starts-with(normalize-space(), 'PRIME MINISTER'))
]
针对此XML文档进行评估:
<html>
<p>
<b> <span> Article 1. </span> </b>
<span>
To approve the master plan on development
of tourism in Northern Central Vietnam
with the following principal contents:
</span>
</p>
<p>
<span>
1. Development viewpoints
</span>
</p>
<p>
<span>To realize general viewpoints of the strategy for and master plan on development of Vietnam’s tourism through 2020.
</span>
</p>
<p>
<span>PRIME MINISTER: Nguyen Tan Dung</span>
</p>
<p>
<span>
<b> PRIME MINISTER </b>
</span>
</p>
<p>
<b> <span> Article 2. </span> </b>
<span>
.................
</span>
</p>
<p>
<span> PRIME MINISTER: Nguyen Tan Dung</span>
</p>
</html>
它完全选择了所需的<p>
元素。
验证:
此XSLT转换对XPath表达式求值并输出在此求值中选择的所有节点:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/p[starts-with(normalize-space(), 'Article 1.')]
[1]
| /*/p[starts-with(normalize-space(), 'Article 1.')]
[1]/following-sibling::p
[not(preceding-sibling::p[starts-with(normalize-space(), 'PRIME MINISTER')])
and
following-sibling::p[starts-with(normalize-space(), 'PRIME MINISTER')]
and not(starts-with(normalize-space(), 'PRIME MINISTER'))
]
"/>
</xsl:template>
</xsl:stylesheet>
当应用于相同的XML文档(如上)时,会产生所需的结果:
<p>
<b>
<span> Article 1. </span>
</b>
<span>
To approve the master plan on development
of tourism in Northern Central Vietnam
with the following principal contents:
</span>
</p>
<p>
<span>
1. Development viewpoints
</span>
</p>
<p>
<span>To realize general viewpoints of the strategy for and master plan on development of Vietnam’s tourism through 2020.
</span>
</p>
,并在浏览器中按预期显示:
第1条。 批准发展总体规划 越南中北部旅游业 主要内容如下:
1.发展观点
了解到2020年越南旅游业发展战略的总体观点和总体规划。