如何修复IndexError:列表索引超出范围代理

时间:2019-06-26 19:31:49

标签: python

我在这里有一个免费代理脚本,但现在有一个错误:

回溯(最近通话最近一次):

文件“ proxi.py”,位于

的第14行

if(td [6] .text ==“ no”):#如果将“ no”更改为“ yes”,则会得到https

IndexError:列表索引超出范围

import requests
from bs4 import BeautifulSoup

out = ""
urls = ["http://www.us-proxy.org/","http://free-proxy-list.net/uk-proxy.html","http://free-proxy-list.net/anonymous-proxy.html"]
for url in urls:
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
tr = soup.find_all("tr")
for t in tr:
td = t.find_all("td")
if (td):
  if (td[6].text=="no"): # If you change "no" to "yes" you get https
    out+=(td[0].text+":"+td[1].text+"\n")
f = open("proxy.txt", "w")
f.write(out)
f.close()

2 个答案:

答案 0 :(得分:1)

td并不总是在第6点有索引

因此,当您执行td [6]时,会给您一个索引错误

在这段代码中,我打印出td的长度 https://onlinegdb.com/BkcnRSZgr

import requests
from bs4 import BeautifulSoup

out = ""
urls = ["http://www.us-proxy.org/","http://free-proxy-list.net/uk-proxy.html","http://free-proxy-list.net/anonymous-proxy.html"]
for url in urls:
    r = requests.get(url)
    data = r.text
    soup = BeautifulSoup(data, "html.parser")
    tr = soup.find_all("tr")
for t in tr:
    td = t.find_all("td")
    print(len(td))
    if (td):
        if (td[6].text=="yes"): # If you change "no" to "yes" you get https
            out+=(td[0].text+":"+td[1].text+"\n")
f = open("proxy.txt", "w")
f.write(out)
f.close()

这是我所发生的事的一个例子。 https://onlinegdb.com/HJIXlL-xH

希望它能帮助您更好地理解。

答案 1 :(得分:1)

这些URL具有相似的标记结构:

urls = ["http://www.us-proxy.org/","http://free-proxy-list.net/uk-proxy.html","http://free-proxy-list.net/anonymous-proxy.html"]

有一个ID为proxylisttable的表,其中包含具有页眉行和页脚行的代理列表。

我建议将tr的选择限制在此表之内,例如

trs = bs.select("table#proxylisttable tr")
proxies = trs[1:-1] # exclude heading and footer