我正在尝试使用内置搜索功能的网站从中收集数据,但无法弄清楚如何按下“搜索”按钮,因为它有一些javascript环绕它并且id随每次新的迭代而变化这页纸。
网站部分的数据如下。
<html>
<head>
</head>
<body>
<table>
<tr>
<td>
<td>
<table>
<tr>
<td>
<!-- start of toolbar Main -->
<table>
<tr>
<td>
<table>
<tr class="buttonPad">
</tr>
<tr>
* <td nowrap="true" valign="top" class="button"><a id="S7674" accesskey="S" class="button" title="SEARCH" onclick="dispatch('S7674');"><u>S</u>></td>
</tr>
</table>
</td>
<td</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</td>
</tr>
</table>
</body>
</html>
和我的代码
my $tree= HTML::TreeBuilder::XPath->new;
$tree->parse($url);
my @nodes = $tree->findnodes('/html/body/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr/td/table.buttonSpace/tbosy/tr/a.button')->get_nodelist; # line is modified later.
my $nodecount = scalar(@nodes);
if ($nodecount > 0 ) { print "we found something\n"; }
else { print "nothing found\n"; }
foreach my $node (@nodes)
{
print "node is $node\n";
my $id = $node->findvalue('button');
print "my id is $id\n";
}
遗憾的是,我的代码没有返回任何节点值。
非常感谢提前。
微
答案 0 :(得分:1)
这似乎有效:
use strict;
use warnings;
use HTML::TreeBuilder;
use Data::Dumper;
my $html = <<HTML;
<html>
<head>
</head>
<body>
<table>
<tr>
<td>
<td>
<table>
<tr>
<td>
<!-- start of toolbar Main -->
<table>
<tr>
<td>
<table>
<tr class="buttonPad">
</tr>
<tr>
<td nowrap="true" valign="top" class="button"><a id="S7674" accesskey="S" class="button" title="SEARCH" onclick="dispatch('S7674');"><u>S</u>></td>
</tr>
</table>
</td>
<td</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</td>
</tr>
</table>
</body>
</html>
HTML
my $tree = HTML::TreeBuilder->new_from_content( $html );
foreach my $atag ( $tree->look_down( _tag => q{a}, 'class' => 'button', 'title' => 'SEARCH' ) ) {
print Dumper $atag->attr('id');
}
答案 1 :(得分:0)
您可以尝试更简单的XPath查询。你不需要在那里拥有整个层次结构,这太过分了。很难做到正确:您的HTML不包含查询中的tbody
(也不包括您拥有的tbosy
; - )。
如果您通过按钮类和标题识别元素的方式,请尝试此操作:
$tree->findnodes( '//td[@class="button"]/a[@class="button" and @title="SEARCH"]')