使用XPath查找节点直到条件

时间:2011-07-24 04:51:28

标签: xml xpath

我应该使用什么表达式来查找包含文本tdFoo的所有Bar个节点,并在下一个<td colspan="4">文本未知之前停止。感谢。

<td colspan="4">Foo || Bar</td>
<td rowspan="4">TEXT1</td>
<td valign="top">TEXT2</td>
<td valign="top">TEXT3</td>
...
<td colspan="4">VARIABLE</td>
...

更新:

use strict; 
use warnings;
use autodie;
use utf8;
use WWW::Mechanize;
use HTML::TreeBuilder::XPath;

my $url = 'www.perl.org';

my $mech = WWW::Mechanize->new;
$mech->agent_alias( 'Windows Mozilla' );
$mech->get( $url );

my $tree= HTML::TreeBuilder::XPath->new;

$tree->parse($mech->content);

for my $nodes ($tree->findnodes('//td[
                            preceding-sibling::td
                            [contains(., "Foo") or contains(., "Bar")] 
                            and following-sibling::td[@colspan="4"]
                            ]')) {

    print $nodes->as_text;

}

2 个答案:

答案 0 :(得分:1)

您可以使用此XPath:

//td[
      preceding-sibling::td
            [contains(., 'Foo') or contains(., 'Bar')] 
      and following-sibling::td[@colspan = 4]
]

它将返回:

<td rowspan="4">TEXT1</td>
<td valign="top">TEXT2</td>
<td valign="top">TEXT3</td>

答案 1 :(得分:0)

使用XPath 2.0和XQuery 1.0时,运算符<<>>有助于表达您所拥有的条件,例如:使用XQuery,您可以很好地编写

let $tr := <tr>
<td colspan="4">Foo || Bar</td>
<td rowspan="4">TEXT1</td>
<td valign="top">TEXT2</td>
<td valign="top">TEXT3</td>
.....
<td colspan="4">VARIABLE</td>
</tr>
let $td1 := $tr/td[contains(., 'Foo') or contains(., 'Bar')][1]
let $td2 := $td1/following-sibling::td[@colspan = 4][1]
return $tr/td[. >> $td1 and . << $td2]

在“另外两个'td元素之间找到td元素。

显然,使用XPath 2.0,你没有let和return,所以你需要尝试将所有东西都粘贴到一个表达式中:

$tr/td[. >> $tr/td[contains(., 'Foo') or contains(., 'Bar')][1] and . << $tr/td[contains(., 'Foo') or contains(., 'Bar')][1]/following-sibling::td[@colspan = 4][1]]

其中$tr是上下文节点。