我正在尝试遍历每个元素,但是下面的内部循环存在问题。在我看来,xpath模式'* / td'没有返回任何结果。我期待看到标签内的数据打印到stdout。我正在使用nokogiri。
我将它粘贴到我的rails控制台中:
require 'nokogiri'
f = File.open("public/index.html")
doc = Nokogiri::HTML(f)
f.close
doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
puts "row= " + row.to_s
row.xpath('*/td').each do |td|
puts "td= " + td
end
end
这是控制台的输出:
row= <tr id="208894">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td>
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td>
<td headers="WhoIsOnDutyTableLevel1:header:3">0</td>
</tr>
row= <tr id="207792">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td>
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td>
<td headers="WhoIsOnDutyTableLevel1:header:3">5</td>
</tr>
=> 0
这是我正在解析的html:
<table class="duty-report-level1" id="WhoIsOnDutyTableLevel1">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1" class="duty-report-lt-header">c</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1">
<table class="duty-report-level2" id="WhoIsOnDutyTableLevel2">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1">Group Name</th><th id="WhoIsOnDutyTableLevel1:header:2">Group Time Zone</th><th id="WhoIsOnDutyTableLevel1:header:3">Default Devices</th><th id="WhoIsOnDutyTableLevel1:header:4">Supervisors</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/GroupDetails.do;jsessionid=17gaw4aw5pv8s?_data=TJZuNquzHUgWcre8AVcKpAFRUsezgPKzbHn7hwtTf9Ei0C2PJ8QYcKIy8OkorCWT8HDTAzkon1ls%0D%0AefuHC1N%2F0SLQLY8nxBhwesdd7Zeg6NzvCfuzRqLg5g%3D%3D" name="team1" id="team1" class="details">Team 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2" class="centered-text">US/Pacific</td><td headers="WhoIsOnDutyTableLevel1:header:3" class="centered-text"><img src="/static/images/icon_boolean_false.png" alt="No" border="0"></td><td headers="WhoIsOnDutyTableLevel1:header:4">
<values>
</values><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z7AnuRhH67H6AixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="mgr1" id="mgr1" class="details">Mgr 1</a>
<br>
</td>
</tr>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="4">
<table class="duty-report-level3" id="WhoIsOnDutyTableLevel3">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1" class="th-left">a</th><th id="WhoIsOnDutyTableLevel1:header:2" class="">b</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="2">
<table class="duty-report-level4" id="WhoIsOnDutyTableLevel4">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1">Recipient</th><th id="WhoIsOnDutyTableLevel1:header:2">Category</th><th id="WhoIsOnDutyTableLevel1:header:3">Escalation</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr id="208894">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">0</td>
</tr>
<tr id="207792">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">5</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
答案 0 :(得分:5)
您需要对XPath进行微小更改:
doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
# puts "row= " + row.to_s
row.xpath('./td').each do |td|
puts "td= " + td.text
end
end
哪个输出:
td= User 1 td= PERSON td= 0 td= User 2 td= PERSON td= 5
使用./td
作为td
的XPath基本上意味着“从这一点开始往下看”。
就个人而言,除非你绝对需要XPath,否则我建议使用CSS访问器。它们更具可读性,而且通常更简单:
doc.search('#WhoIsOnDutyTableLevel4 tbody tr').each do |row|
row.search('td').each do |td|
puts "td= " + td.text
end
end
我建议使用search
代替css
或xpath
和at
代替at_css
或at_xpath
。当你选择一个而不是另一个时,没有真正的魔法,你只需要记住两种不同的方法。
答案 1 :(得分:0)
内循环中的XPath表达式是相对于每个tr
进行评估的,因此您要使用td
(选择 children td
元素。上下文tr
)而非*/td
(选择孙子 td
元素。)
完整代码:
doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
puts "row= " + row.to_s
row.xpath('td').each do |td|
puts "td= " + td
end
end