我已经搜索并搜索了3天,试图让数据抓取工作,看起来我已经成功解析了HTML表格,如下所示:
<tr class='ds'>
<td class='ds'>Length:</td>
<td class='ds'>1/8"</td>
</tr>
<tr class='ds'>
<td class='ds'>Width:</td>
<td class='ds'>3/4"</td>
</tr>
<tr class='ds'>
<td class='ds'>Color:</td>
<td class='ds'>Red</td>
</tr>
但是,我似乎无法正确打印到csv。
Ruby代码如下:
Specifications = {
:length => ['Length:','length','Length'],
:width => ['width:','width','Width','Width:'],
:Color => ['Color:','color'],
.......
}.freeze
def specifications
@specifications ||= xml.css('tr.ds').map{|row| row.css('td.ds').map{|cell| cell.children.to_s } }.map{|record|
specification = Specifications.detect{|key, value| value.include? record.first }
[specification.to_s.titleize, record.last] }
end
csv打印成一列(似乎是完整的数组):
[["", nil], ["[:finishtype, [\"finish\", \"finish type:\", \"finish type\", \"finish type\", \"finish type:\"]]", "Metal"], ["", "1/4\""], ["[:length, [\"length:\", \"length\", \"length\"]]", "18\""], ["[:width, [\"width:\", \"width\", \"width\", \"width:\"]]", "1/2\""], ["[:styletype, [\"style:\", \"style\", \"style:\", \"style\"]]"........
我认为问题在于我没有指定要返回的值,但是当我尝试指定输出时,我没有成功。任何帮助将不胜感激!
答案 0 :(得分:0)
尝试更改
[specification.to_s.titleize, record.last]
到
[specification.last.first.titleize, record.last]
detect
产生例如[:length, ["Length:", "length", "Length"]]
这将成为
"[:length, [\"Length:\", \"length\", \"Length\"]]"
to_s
。使用last.first
,您只需提取其"Length:"
部分。
如果您遇到与Specification
不匹配的属性,您可以通过更改为删除值:
xml.css('tr.ds').map{|row| row.css('td.ds').map{|cell| cell.children.to_s } }.map{|record|
specification = Specifications.detect{|key, value| value.include? record.first }
[specification.last.first.titleize, record.last] if specification
}.compact