我有一个Ruby脚本,它遍历项目列表。对于每个项目,它遍历HTML表,收集每行的td
文本并将其添加到数组中。
问题是,当该表对于该特定项目为空时,它会向我的二维数组添加一个空数组,当我尝试使用该数组将数据插入SQL数据库时会导致错误。如何防止空数组附加到数组的开头?
projects.each do |project_id|
url = "http://myurl.com/InventoryMaster.aspx?Qtr=%s&Client=%s" % [qtr,project_id[1]]
page = Nokogiri::HTML(open(url))
table = page.at('my_table')
rows = Array.new
table.search('tr').each do |tr|
cells = Array.new
tr.search('td').each do |cell|
cells.push(cell.text.gsub(/\r\n?/, "").strip)
end
# add the project id to the cells array, and get ride of other array elements I don't need.
cells.insert(1, project_id[0])
cells.slice!(11, 6)
cells.delete_at(8)
cells.delete_at(2)
cells.delete_at(0)
rows.push(cells)
end
# first row in the array in the html table is headers. get rid of those.
rows.shift
# last row in the html table is the footers. get rid of those too.
rows.pop
p rows
end
以下是我正在解析的HTML:
<table id="ctl00_MainContent_gvSearchResults" cellspacing="1" cellpadding="1"
border="1" style="color:Black;background-color:LightGoldenrodYellow;border-color:Tan;
border-width:1px;border-style:solid;" rules="cols">
<caption></caption>
<tbody>
<tr style="background-color:Tan;font-weight:bold;">
#I don't need the headers.
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
<th scope="col"></th>
</tr>
<tr style="font-family:arial,tahoma;font-size:Smaller;">
<td>not needed</td>
<td>not needed</td>
<td>needed</td>
<td align="right">needed</td>
<td>needed</td>
<td>needed</td>
<td>needed</td>
<td>needed</td>
<td>not needed</td>
<td>needed</td>
#I don't need any of the remaining td's in this row either.
<td align="right"></td>
<td align="right"></td>
<td align="right"></td>
<td align="right"></td>
<td align="right"></td>
<td></td>
</tr>
#this row is the footer, and it isn't needed either.
<tr style="background-color:Tan;">
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
一旦我解析了表格,我就需要添加项目ID,它是projects
数组中包含的键值对的一部分。
答案 0 :(得分:2)
尝试在迭代前过滤projects
数组:
projects.reject(&:empty?).each do |project_id|
现在,您将仅迭代非空数组。
示例时间:
array = [ [1], [], [2, 3] ]
array.reject &:empty? # => [ [1], [2, 3] ]
纯
答案 1 :(得分:0)
您还可以使用delete_if
方法:
array = [ [1], [], [2, 3] ]
array.size # => 3
array.delete_if &:empty? # => [ [1], [2, 3] ]
array.size # => 2