用hpricot解析3个表列

时间:2011-11-04 22:31:16

标签: ruby-on-rails hpricot

我有一个HTML文档,其中包含非常简单的表格

<table>
<tr><th>Country</th><th>Date</th></tr>

<tr>
    <td><b><a href="/calendar/?region=BE">Belgium</a></b></td>
    <td align="right"><a href="/date/04-20/">20 April</a> <a href="/year/2001/">2001</a></td>
    <td>(original release)</td>
</tr>

<tr>
    <td><b><a href="/calendar/?region=BE">Belgium</a></b></td>
    <td align="right"><a href="/date/04-25/">25 April</a> <a href="/year/2001/">2001</a></td>
    <td></td>
</tr>

<tr>
    <td><b><a href="/calendar/?region=FR">France</a></b></td>
    <td align="right"><a href="/date/04-27/">27 April</a> <a href="/year/2001/">2001</a></td>
    <td></td>
</tr>

<tr>
    <td><b><a href="/calendar/?region=CH">Switzerland</a></b></td>
    <td align="right"><a href="/date/05-25/">25 May</a> <a href="/year/2001/">2001</a></td>
    <td>(French speaking region)</td>
</tr>

<tr>
    <td><b><a href="/calendar/?region=CZ">Czech Republic</a></b></td>
    <td align="right"><a href="/date/07-06/">6 July</a> <a href="/year/2001/">2001</a></td>
    <td>(International Film Festival)</td>
</tr>
</table>

前两列很容易解析:

document.search("a[@href*=calendar]").each { |country| countries << country.inner_text }
document.search("td[@align*=right]").each { |date| dates << date.inner_text }

但我从第3栏查找值时遇到麻烦。我需要所有这些数组,包括空白数组。我怎么能这样做?

1 个答案:

答案 0 :(得分:0)

回答我自己的问题:

document.search("td[@align*=right]").each { |comment| comments << comment.next.next.inner_text }