使用Beautifulsoup的循环中的IndexError

时间:2017-06-10 12:54:34

标签: python python-3.x parsing beautifulsoup range

非常着名的IndexError。不幸的是,我真的没有找到解决方案。

最后一次访问最后一个网址时,我总是收到错误消息。网站是否为空。无论范围是2还是20,都会发生此错误。

    text_file = open("Results-from-{}.txt".format(self.entry_get), "w")

    ### Iterator for end of the url
    multiple_url = []
    for iterator_page in range(15):
            iterator_page = iterator_page + 1
            multiple_url.append("".join([self.sub_url, str(iterator_page)]))

    ### loop for visit all 20 pages ###
    parser = 0
    while parser < len(multiple_url):
        print(multiple_url[parser])
        parser += 1
        with urllib.request.urlopen(multiple_url[parser]) as url:
            soup = BeautifulSoup(url, "html.parser")

    ### html tag parsing
            names = [name.get_text().strip() for name in soup.findAll("div", {"class": "name m08_name"})]
            street = [address.get_text().strip() for address in soup.findAll(itemprop="streetAddress")]
            plz = [address.get_text().strip() for address in soup.findAll(itemprop="postalCode")]
            city = [address.get_text().strip() for address in soup.findAll(itemprop="addressLocality")]

    ### zip and write
            for line in zip(names, street, plz , city):
                print("%s;%s;%s;%s;\n" % line)
                text_file.write("%s;%s;%s;%s;\n" % line)

    ### output of the path main: cwd_out_final
    cwd = os.getcwd()
    cwd_out = "\{}".format(text_file.name)
    cwd_out_final = cwd + cwd_out


    text_file.close()

我的错误:

Exception in Tkinter callback
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tkinter/__init__.py", line 1699, in __call__
    return self.func(*args)
  File "/Users/x/PycharmProjects/hackday/parser.py", line 55, in search_complete_inner
    with urllib.request.urlopen(multiple_url[parser]) as url:
IndexError: list index out of range

谢谢!

1 个答案:

答案 0 :(得分:0)

在将parser用作with语句中的索引之前,您将其增加 <?php $a='[ { "user_id":"11", "check_id":"38", "pcode_id":"14", "platform_id":"2", "vin":"MA11340DP0DN09661", "date":"2017-06-09-10-48-25", "status":"completed", "description":"Check for proper insert of the connector and loose", "result":true }, { "user_id":"11", "check_id":"39", "pcode_id":"14", "platform_id":"2", "vin":"MA11340DP0DN09661", "date":"2017-06-09-10-48-25", "status":"completed", "description":"Damaged\/Cracked", "result":false }]'; $json = json_decode($a); for($i=0; $i<count($json); $i++) { echo "user_id :- ".$json[$i]->user_id."<br>"; echo "check_id :- ".$json[$i]->check_id."<br>"; echo "pcode_id :- ".$json[$i]->pcode_id."<br>"; echo "platform_id :- ".$json[$i]->platform_id."<br>"; echo "vin :- ".$json[$i]->vin."<br>"; echo "date :- ".$json[$i]->date."<br>"; echo "status :- ".$json[$i]->status."<br>"; echo "description :- ".$json[$i]->description."<br>"; echo "result :- ".$json[$i]->result."<br><hr>"; } ?> ;在最后一个元素上执行此操作将生成有问题的错误。此外,这意味着您永远不会使用列表中的 first 元素。