我正在尝试使用以下标记解析表。
<table>
<tr class="athlete">
<td colspan="2" class="name">Alex</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="run">
<td>5.20</td>
<td>10.50</td>
</tr>
<tr class="end"></tr>
<tr class="athlete">
<td colspan="2" class="name">John</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="end"></tr>
</table>
我需要循环遍历每个.athlete表行并获取下面的每个兄弟.run表行,直到我到达.end行。然后重复下一位运动员,依此类推。一些.athlete行有两个.run行,其他行有一行。
这是我到目前为止所拥有的。我穿过运动员:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://myurl.com"
doc = Nokogiri::HTML(open(url))
doc.css(".athlete").each do |athlete|
puts athlete.at_css("name").text
# Loop through the sibling .run rows until I reach the .end row
# output the value of the td’s in the .run row
end
我无法弄清楚如何获得每个兄弟.run行,并停在.end行。我觉得如果表格形成得更好会更容易,但不幸的是我无法控制标记。任何帮助将不胜感激!
答案 0 :(得分:0)
我会按照以下方式处理该表:
找到要处理的表格
table = doc.at_css("table")
获取表格中的所有直接行
rows = table.css("> tr")
将具有边界.athlete
和.end
grouped = [[]]
rows.each do |row|
if row['class'] == 'athlete' and grouped.last.empty?
grouped.last << row
elsif row['class'] == 'end' and not grouped.last.empty?
grouped.last << row
grouped << []
elsif not grouped.last.empty?
grouped.last << row
end
end
grouped.pop if grouped.last.empty? || grouped.last.last['class'] != 'end'
处理分组的行
grouped.each do |group|
puts "BEGIN: >> #{group.first.text} <<"
group[1..-2].each do |row|
puts " #{row.text.squeeze}"
end
puts "END: >> #{group.last.text} <<"
end
答案 1 :(得分:0)
瞧
require 'nokogiri'
doc = <<DOC
<table>
<tr class="athlete">
<td colspan="2" class="name">Alex</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="run">
<td>5.20</td>
<td>10.50</td>
</tr>
<tr class="end"></tr>
<tr class="athlete">
<td colspan="2" class="name">John</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="end"></tr>
</table>
DOC
doc = Nokogiri::HTML(doc)
# You can exclude .end, if it is always empty? and not required
trs = doc.css('.athlete, .run, .end').to_a
# This will return [['athlete', 'run', ...,'end'], ['athlete', 'run', ...,'end'] ...]
athletes = trs.slice_before{ |elm| elm.attr('class') =='athlete' }.to_a
athletes.map! do |athlete|
{
name: athlete.shift.at_css('.name').text,
runs: athlete
.select{ |tr| tr.attr('class') == 'run' }
.map{|run| run.text.to_f }
}
end
puts athletes.inspect
#[{:name=>"Alex", :runs=>[5.0, 5.2]}, {:name=>"John", :runs=>[5.0]}]