我是Ruby和watir-webdriver的新手,我试图从网站提取数据,但无法弄清楚如何访问HTML表格中的特定单元格。我找不到任何id,名称或类来区分我需要的单元格。我在想这个表是动态生成的。以下是我到目前为止的情况:
require 'watir-webdriver'
browser = Watir::Browser.new:firefox
browser.goto 'http://oh-scioto-auditor.publicaccessnow.com/search.aspx'
browser.text_field(:id => "fldSearchFor").set '011234000'
browser.button(:name => 'btnSearch').click
browser.link(:text => 'Parcel Detail').click
puts browser.table(:id => 'lxT380').exists?
browser.td(:index => 0).each do |data|
puts data.text
end
当我在firefox中使用firebug获取唯一选择器时,这就是我得到的:
#lxT380 > div:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(4) > td:nth-child(1)
它可能有用,但就像我说我是Ruby的新手,我不知道如何处理它。任何意见都会得到认可。
答案 0 :(得分:2)
正如您所提到的,表的问题在于存在区别属性。因此,我认为您最好的选择是通过以下方式定位细胞:
例如,纳税人地址在此表内(嵌套在一堆其他非描述性表中):
<table class=" ui-corner-all" border="1" width="250">
<tbody>
<tr>
<td colspan="2" class="ui-state-default">Property Address:</td>
</tr>
<tr>
<td colspan="2" height="95" valign="top">3069 GEPHART RD</td>
</tr>
<tr>
<td colspan="2" class="ui-state-default">Tax Payer Address:</td>
</tr>
<tr>
<td colspan="2" height="95" valign="top">FAULKNER PATRICK EUGENE +<br>2112 GEPHART RD<br>WHEELERSBURG OH 45694<br>USA </td>
</tr>
</tbody>
</table>
要获取付款人地址,请找到包含“纳税人地址:”标题的行:
tax_payer_address_label = browser.tr(:text => 'Tax Payer Address:')
获取以下行,假定为地址:
tax_payer_address = tax_payer_address_label.tr(:xpath => './following-sibling::tr')
最后,获取行/单元格的文本:
puts tax_payer_address.text
#=> FAULKNER PATRICK EUGENE +
#=> 2112 GEPHART RD
#=> WHEELERSBURG OH 45694
#=> USA
作为完整的工作脚本:
require 'watir-webdriver'
browser = Watir::Browser.new:firefox
browser.goto 'http://oh-scioto-auditor.publicaccessnow.com/search.aspx'
browser.text_field(:id => "fldSearchFor").set '011234000'
browser.button(:name => 'btnSearch').click
browser.link(:text => 'Parcel Detail').click
tax_payer_address_label = browser.tr(:text => 'Tax Payer Address:')
tax_payer_address = tax_payer_address_label.tr(:xpath => './following-sibling::tr')
puts tax_payer_address.text
#=> FAULKNER PATRICK EUGENE +
#=> 2112 GEPHART RD
#=> WHEELERSBURG OH 45694
#=> USA
请注意,您也可以使用单个xpath,但读取/写入效果不是很好:
puts browser.tr(:xpath => '//tr[normalize-space(.) = "Tax Payer Address:"]/following-sibling::tr').text
#=> FAULKNER PATRICK EUGENE +
#=> 2112 GEPHART RD
#=> WHEELERSBURG OH 45694
#=> USA