在一些讨厌的嵌套表上使用Web :: Scrape,没有CSS样式。必须学习XPATH,然后绊倒。
更新:修复了一些XPATH问题,现在只剩下一个关于属性的问题了
#!perl
use warnings;
use Web::Scraper;
use Data::Dumper;
my $html = do { local $/; <DATA> };
my $scraper = scraper {
# Wrong! The 'tbody' element does not exist.
# process ".//[@id='cfg-surface-detail']/center/table/tbody/tr/td[2]/select",
# I used Chrome to get the XPath, and it inserts tbody elements when rendering bad HTML
# also, I changed the start of the XPATH from './/' to '//*'
# which I think means "relative to anywhere" or something.
process "//*[@id='cfg-surface-detail']/center/table/tr/td[2]/select",
'sensorType[]' => 'TEXT';
};
my $res = $scraper->scrape($html);
print Dumper($res);
__DATA__
<html><head><title>...</title></head>
<body>
<form action="/foo" method=post id=cfg-surface-detail name=cfg-surface-detail>
<center>
<table bgcolor="#FFFFFF">
<tr><td>Sensor Type</td><td>
<select name="cfg-sensor-type" >
<option value="1 Fred's Sensor" selected>Fred's Sensor
<option value="2 Other">Other Sensor
</select>
</td></tr>
</table>
</center>
</form>
</body>
</html>
此现在输出:
$VAR1 = {
'sensorType' => [
'Fred\'s Sensor Other Sensor '
]
};
所以我越来越近了。现在,我如何指定具有<option>
属性的selected
?
更新:已解决。 Xpath是//*[@id="cfg-surface-detail"]/center/table/tr/td[2]/select/option[@selected]
答案 0 :(得分:0)
#!perl
use warnings;
use Web::Scraper;
use Data::Dumper;
my $html = do { local $/; <DATA> };
my $scraper = scraper {
process '#cfg-surface-detail//select',
'sensorType[]' => 'TEXT';
};
my $res = $scraper->scrape($html);
print Dumper($res);
__DATA__
<html><head><title>...</title></head>
<body>
<form action="/foo" method=post id=cfg-surface-detail name=cfg-surface-detail>
<center>
<table bgcolor="#FFFFFF">
<tr><td>Sensor Type</td><td>
<select name="cfg-sensor-type" >
<option value="1 Fred's Sensor" selected>Fred's Sensor
<option value="2 Other">Other Sensor
</select>
</td></tr>
</table>
</center>
</form>
</body>
</html>
答案 1 :(得分:0)
如果是我,我会选择css。所选选项的Css解决方案是:
'select[name="cfg-sensor-type"] option[selected]'
答案 2 :(得分:0)
答案有点来自之前的两个答案:
$scraper = scraper {
process '//select[@name="cfg-sensor-type"]/option[@selected]', 'SensorType' => 'TEXT';
};