列表索引超出范围刮擦

时间:2016-12-29 10:23:04

标签: python web-scraping beautifulsoup

 soup = BeautifulSoup(driver.page_source)

        for each_div in soup.findAll("div", { "class" : "trapac-form-view-results tpc-results" }):

            if each_div.findAll("table", {"class": "sticky-enabled table-select-processed tableheader-processed sticky-table"})[1]:
                for child0 in each_div.findAll("table", {"class": "sticky-enabled table-select-processed tableheader-processed sticky-table"})[1]:

                    if child0.name == "table":
                         print("2")
                         child = child0.findChildren()

                         for child in child0:
                             if child.name == "tbody":
                                 child1 = child.findChildren()

上面的代码完全正常但是当table[1]代码不可用时,它会给我IndexError

IndexError: list index out of range

我试过尝试抓住没有成功

我如何以这样的方式放置条件:如果表[1]不存在,我应该完全循环并寻找下一个变量

帮助赞赏

2 个答案:

答案 0 :(得分:0)

方法 findAll 返回包含所有匹配元素的列表,这意味着如果列表的长度为0,则不能匹配任何表。通过检查长度是否大于1,您可以确定至少有两个匹配的表元素(因为您使用 findAll 找到的第二个元素)。

通过将结果存储在变量中,您还可以避免重复工作(即,两次调用 findAll )。

soup = BeautifulSoup(driver.page_source)

for each_div in soup.findAll("div", { "class" : "trapac-form-view-results tpc-results" }):
    tables = each_div.findAll("table", {"class": "sticky-enabled table-select-processed tableheader-processed sticky-table"})
    if len(tables) > 1:
        for child0 in tables[1]:

            if child0.name == "table":
                 print("2")
                 child = child0.findChildren()

                 for child in child0:
                     if child.name == "tbody":
                         child1 = child.findChildren()

答案 1 :(得分:0)

Pythonic方法是使用try/except(但不是if,当然不是len!)

soup = BeautifulSoup(driver.page_source)

    for each_div in soup.findAll("div", { "class" : "trapac-form-view-results tpc-results" }):

        try:
            node = each_div.findAll("table", {"class": "sticky-enabled table-select-processed tableheader-processed sticky-table"})[1]
        except IndexError:
            continue

        for child0 in node:

            if child0.name == "table":
                 print("2")
                 child = child0.findChildren()

                 for child in child0:
                     if child.name == "tbody":
                         child1 = child.findChildren()

several reasons为什么这种方式是最好的方式。