我必须在“有序”的xml文件中进行搜索,其中我的文本将被转发到这样的几个节点上。
<root>
<div id="1">Hello</div>
<div id="2">Hel</div>
<div id="3">lo dude</div>
<div id="4">H</div>
<div id="5">el</div>
<div id="6">lo</div>
</root>
必须在连续文本上进行搜索:
HelloHello dudeHello
但我需要能够检索节点属性。例如,对于'll'搜索,我希望得到节点:
<div id="1">Hello</div>
<div id="2">Hel</div>
<div id="3">lo dude</div>
<div id="5">el</div>
<div id="6">lo</div>
或至少是ids。
有人知道如何在XPath或其他任何方式中执行此操作吗?
我觉得这有点挑战,我暂时没有(简单)的想法。 谢谢你的帮助。
编辑:在搜索是关键信息之前必须连接文本,并且必须进行精确处理!
答案 0 :(得分:0)
您的更新要求使问题变得更加复杂,因为“元素换行”可能发生在搜索令牌内的任意点,甚至可能跨越多个元素。我不认为你能在XPath中写一个查询&lt; 3.0(如果你只能在XPath中完成它)。我使用了XQuery,它扩展了XPath。代码在BaseX中正常运行,但是也应该在所有其他XQuery引擎中运行(可能需要XQuery 3.0,没有看看)。
代码变得相当复杂,我想我在那里写了足够的评论以使其易于理解。它需要节点位于下一个元素中,但通过微调,它也可以用于遍历任意XML结构(使用<span/>
和其他标记来考虑HTML。)
(: functx dependencies :)
declare namespace functx = "http://www.functx.com";
declare function functx:is-node-in-sequence
( $node as node()? ,
$seq as node()* ) as xs:boolean {
some $nodeInSeq in $seq satisfies $nodeInSeq is $node
} ;
declare function functx:distinct-nodes
( $nodes as node()* ) as node()* {
for $seq in (1 to count($nodes))
return $nodes[$seq][not(functx:is-node-in-sequence(
.,$nodes[position() < $seq]))]
} ;
declare function local:search( $elements as item()*, $pattern as xs:string) as item()* {
functx:distinct-nodes(
for $element in $elements
return ($element[contains(./text(), $pattern)], local:start-search($element, $pattern))
)
};
declare function local:start-search( $element as item(), $pattern as xs:string) as item()* {
let $splits := (
(: all possible prefixes of search token :)
for $i in 1 to string-length($pattern) - 1
(: check whether element text starts with prefix :)
where ends-with($element/text(), substring($pattern, 1, $i))
return $i
)
(: go on for all matching prefixes :)
for $split in $splits
return
(: recursive call to next element :)
let $continue := local:continue-search($element/following-sibling::*[1], substring($pattern, $split+1))
where not(empty($continue))
return ($element, $continue)
};
declare function local:continue-search( $element as item()*, $pattern as xs:string) as item()* {
if (empty($element)) then () else
(: case a) text node contains whole remaining token :)
if (starts-with($element/text(), $pattern))
then ($element)
(: case b) text node is part of token :)
else if (starts-with($pattern, $element/text()))
then
(: recursive call to next element :)
let $continue := local:continue-search($element/following-sibling::*[1], substring($pattern, 1+string-length($element/text())))
where not(empty($continue))
return ($element, $continue)
(: token not found :)
else ()
};
let $token := 'll'
return local:search(//div, $token)
答案 1 :(得分:0)
在XPath 2中,您可以使用tokenize来计算搜索文本出现的频率,然后测试每个节点,如果不在文本中包含此节点,则会减少出现的次数。如果数量减少,则该节点必须包含在结果中。这不是那么快。
假设只有直接子节点中的文本很重要,就像在示例中一样,它看起来像这样:
for $searched in "ll"
return //*/ for $matches in count(tokenize(string-join(*, ""), $searched)) - 1
return *[$matches > count(tokenize(concat(" ",string-join(preceding-sibling::*, "")), $searched)) +
count(tokenize(concat(" ",string-join(following-sibling::*, "")), $searched)) - 2]