如何在循环中处理空白数组元素?

时间:2014-01-10 21:37:37

标签: ruby arrays

我有一个Ruby脚本,它遍历项目列表。对于每个项目,它遍历HTML表,收集每行的td文本并将其添加到数组中。

问题是,当该表对于该特定项目为空时,它会向我的二维数组添加一个空数组,当我尝试使用该数组将数据插入SQL数据库时会导致错误。如何防止空数组附加到数组的开头?

projects.each do |project_id|
  url = "http://myurl.com/InventoryMaster.aspx?Qtr=%s&Client=%s" % [qtr,project_id[1]]

  page = Nokogiri::HTML(open(url))
  table = page.at('my_table')

  rows = Array.new
  table.search('tr').each do |tr|
    cells = Array.new

    tr.search('td').each do |cell|
      cells.push(cell.text.gsub(/\r\n?/, "").strip)
    end 
    # add the project id to the cells array, and get ride of other array elements I don't need.
    cells.insert(1, project_id[0])
    cells.slice!(11, 6)
    cells.delete_at(8)
    cells.delete_at(2)
    cells.delete_at(0)
    rows.push(cells)
  end

  # first row in the array in the html table is headers.  get rid of those.
  rows.shift
  # last row in the html table is the footers.  get rid of those too.
  rows.pop

  p rows

end

以下是我正在解析的HTML:

<table id="ctl00_MainContent_gvSearchResults" cellspacing="1" cellpadding="1" 
border="1" style="color:Black;background-color:LightGoldenrodYellow;border-color:Tan;
border-width:1px;border-style:solid;" rules="cols">

<caption></caption>
<tbody>
    <tr style="background-color:Tan;font-weight:bold;">
#I don't need the headers.
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
        <th scope="col"></th>
    </tr>
    <tr style="font-family:arial,tahoma;font-size:Smaller;">
        <td>not needed</td>
        <td>not needed</td>
        <td>needed</td>
        <td align="right">needed</td>
        <td>needed</td>
        <td>needed</td>
        <td>needed</td>
        <td>needed</td>
        <td>not needed</td>
        <td>needed</td>

#I don't need any of the remaining td's in this row either.
        <td align="right"></td>
        <td align="right"></td>
        <td align="right"></td>
        <td align="right"></td>
        <td align="right"></td>
        <td></td>
    </tr>
#this row is the footer, and it isn't needed either.
    <tr style="background-color:Tan;">
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
    </tr>
</tbody>

一旦我解析了表格,我就需要添加项目ID,它是projects数组中包含的键值对的一部分。

2 个答案:

答案 0 :(得分:2)

尝试在迭代前过滤projects数组:

projects.reject(&:empty?).each do |project_id|

现在,您将仅迭代非空数组。

示例时间:

array = [ [1], [], [2, 3] ]
array.reject &:empty? # => [ [1], [2, 3] ]

答案 1 :(得分:0)

您还可以使用delete_if方法:

array = [ [1], [], [2, 3] ]
array.size # => 3
array.delete_if &:empty? # => [ [1], [2, 3] ]    
array.size # => 2