指数超出范围,但实际上是在范围内

时间:2018-03-21 13:31:09

标签: python list web-scraping beautifulsoup range

我遇到了一个问题,我使用逗号分隔符从一些文本中拆分了一个列表。

该列表有4-5项,但当我尝试获取列表中的第4至第5项时,它表示超出范围。不知道为什么会这样。

import re
import requests
r = requests.get('https://halalhmc.org/outlets-by-name/')
from bs4 import BeautifulSoup

soup = BeautifulSoup(r.text, 'html.parser')

results = soup.find_all('div', attrs={'class':'outlet-content'})

records = []
for result in results:

    name = result.find('h3').text
    fullAddress = result.find('p', attrs={'class':'outlet-address'}).text
    split_address = fullAddress.split(',')
    address1 = split_address[0]
    city = split_address[1]
    city1 = split_address[2]
    validPcode = re.match("[A-Z]{2}[0-9] [0-9][A-Z]{2}",split_address[3])
    if validPcode:
        postCode = split_address[3]
    else:
        county = split_address[3]
        postCode = split_address[4]
    records.append((name,address1[10:],city,city1,postCode))

print records[2]

当我打印len(split_address)并删除print records[1]时,我的长度为5。

为什么会这样?

1 个答案:

答案 0 :(得分:0)

首先尝试自己调试代码。无论如何,对于你的问题为什么会这样? :

它会在postCode = split_address[4]中给出错误,因为你的列表有4个元素0,1,2,3并且你正在访问不存在的第4个元素。 enter image description here

你没有index[4]这就是为什么它会让你的错误超出范围!

更新区域:

在你的for循环中,split_address的长度变为4和5,当长度为4并且你试图获得index[4]时它肯定会给出你的索引误差超出范围要解决这个问题,你必须添加一个检查。

试试这个:

import re
import requests
r = requests.get('https://halalhmc.org/outlets-by-name/')
from bs4 import BeautifulSoup

soup = BeautifulSoup(r.text, 'html.parser')

results = soup.find_all('div', attrs={'class':'outlet-content'})

records = []
for result in results:

    name = result.find('h3').text
    fullAddress = (result.find('p', attrs={'class':'outlet-address'})).text
    split_address = fullAddress.split(',')
    address1 = split_address[0]
    city = split_address[1]
    city1 = split_address[2]
    validPcode = re.match("[A-Z]{2}[0-9] [0-9][A-Z]{2}",split_address[3])
    if validPcode:
        postCode = split_address[3]
    else:
        county = split_address[3]
        if len(split_address) is 4:
            postCode = split_address[3]
        elif len(split_address) is 3:
            postCode = split_address[4]
    records.append((name,address1[10:],city,city1,postCode))

# only for checking
print records[1]
print records[2]
print records[3]
print records[4]
print records[5]

我已经稍微修改了你的代码,只添加了一个检查(split_address)。

希望您能理解并帮助您! :)