我遇到了一个问题,我使用逗号分隔符从一些文本中拆分了一个列表。
该列表有4-5项,但当我尝试获取列表中的第4至第5项时,它表示超出范围。不知道为什么会这样。
import re
import requests
r = requests.get('https://halalhmc.org/outlets-by-name/')
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('div', attrs={'class':'outlet-content'})
records = []
for result in results:
name = result.find('h3').text
fullAddress = result.find('p', attrs={'class':'outlet-address'}).text
split_address = fullAddress.split(',')
address1 = split_address[0]
city = split_address[1]
city1 = split_address[2]
validPcode = re.match("[A-Z]{2}[0-9] [0-9][A-Z]{2}",split_address[3])
if validPcode:
postCode = split_address[3]
else:
county = split_address[3]
postCode = split_address[4]
records.append((name,address1[10:],city,city1,postCode))
print records[2]
当我打印len(split_address)
并删除print records[1]
时,我的长度为5。
为什么会这样?
答案 0 :(得分:0)
首先尝试自己调试代码。无论如何,对于你的问题为什么会这样? :
它会在postCode = split_address[4]
中给出错误,因为你的列表有4个元素0,1,2,3并且你正在访问不存在的第4个元素。
你没有index[4]
这就是为什么它会让你的错误超出范围!
更新区域:
在你的for循环中,split_address
的长度变为4和5,当长度为4并且你试图获得index[4]
时它肯定会给出你的索引误差超出范围要解决这个问题,你必须添加一个检查。
试试这个:
import re
import requests
r = requests.get('https://halalhmc.org/outlets-by-name/')
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('div', attrs={'class':'outlet-content'})
records = []
for result in results:
name = result.find('h3').text
fullAddress = (result.find('p', attrs={'class':'outlet-address'})).text
split_address = fullAddress.split(',')
address1 = split_address[0]
city = split_address[1]
city1 = split_address[2]
validPcode = re.match("[A-Z]{2}[0-9] [0-9][A-Z]{2}",split_address[3])
if validPcode:
postCode = split_address[3]
else:
county = split_address[3]
if len(split_address) is 4:
postCode = split_address[3]
elif len(split_address) is 3:
postCode = split_address[4]
records.append((name,address1[10:],city,city1,postCode))
# only for checking
print records[1]
print records[2]
print records[3]
print records[4]
print records[5]
我已经稍微修改了你的代码,只添加了一个检查(split_address
)。
希望您能理解并帮助您! :)