我有一个像这样的HTML:
...
<table>
<tbody>
...
<tr>
<th> head </th>
<td> td1 text<td>
<td> td2 text<td>
...
</tr>
</tbody>
<tfoot>
</tfoot>
</table>
...
我正在使用带有红宝石的Nokogiri。我希望遍历每一行,并将th和相应td的文本放入哈希值。
答案 0 :(得分:3)
require "nokogiri"
#Parses your HTML input
html_data = "...stripped HTML markup code..."
html_doc = Nokogiri::HTML html_data
#Iterates over each row in your table
#Note that you may need to clarify the CSS selector below
result = html_doc.css("table tr").inject({}) do |all, row|
#Modify if you need to collect only the first td, for example
all[row.css("th").text] = row.css("td").text
end
答案 1 :(得分:1)
我没有运行此代码,所以我不是很确定,但总体思路应该是正确的:
html_doc = Nokogiri::HTML("<html> ... </html>")
result = []
html_doc.xpath("//tr").each do |tr|
hash = {}
tr.children.each do |node|
hash[node.node_name] = node.content
end
result << hash
end
puts result.inspect
有关详细信息,请参阅文档:http://nokogiri.org/Nokogiri/XML/Node.html