Question

我使用BeautifulSoup 4.4在网站上抓取表格中不断变化的行数。在下面的代码中有四个表 - 但它每天都在不断变化。

主要问题：

如何摆脱IndexError消息？

状态：我试图将可迭代项目的数量设置为最大迭代次数（但它没有解决实际问题）。

子问题：

我计划将输出附加到文件 - 索引错误迭代此表会以任何方式影响数据输出或连接到迭代的其他进程吗？（我仍然想避免错误消息无论）。

IndexError讯息： item_name = strengths.findAll('tr')[x].findAll('td')[0].get_text() IndexError: list index out of range

<tbody>
    <tr>
        <td>
            <div class="iconize iconize-icon-left">
                <span class="incidents-icon" title="Description"></span>
                Heinz 57 ketchup 
            </div>
        </td>
        <td style="text-align: right;">
            <span class="level">Popular</span>
        </td>
    </tr>
<tr> # same structure as the tr above
<tr> # same structure as the tr above
<tr> # same structure as the tr above
</tbody>

到目前为止我的代码：

strengths = strengths_div.table.tbody

output = []

iter_length = len(list(strengths)) # Finding out the number of iterable elements

x = 0 # counter 

for tr in strengths:
    while x <= int(iter_length):  

    item_name = strengths.findAll('tr')[x].findAll('td')[0].get_text()
    strength_value = strengths.findAll('tr')[x].findAll('td')[1].get_text()
    item_name = item_name.strip()
    strength_value = strength_value.strip()

    x = x + 1

Answer 1

首先，如果您要使用索引，则不要使用x = len（可迭代），因为长度为n的可迭代将不具有索引n。最大的索引是n - 1，因此while循环行应该像这样开始：while x < int(iter_length):。此外，我不了解您的外部for循环的目的，因为您在我可以看到的循环中的任何地方都没有使用tr。

避免索引错误的一个好方法是遍历迭代中的项而不是索引上的项。它通常使代码更整洁，也更容易阅读。这就是我要做的事情：

for items in strengths.findAll('tr'):

    item_name = items.findAll('td')[0].get_text()
    strength_value = items.findAll('td')[1].get_text()
    item_name = item_name.strip()
    strength_value = strength_value.strip()

迭代可变数量的表行时获取IndexError

1 个答案: