python web scraping期间索引超出范围错误(beautifulsoup)

时间:2014-12-11 13:02:58

标签: python web-scraping beautifulsoup

我使用python程序来截取特定页面。我正在使用的代码就是这个。

#Area

    try:
        area= soup.find('div', 'location')
        result= str(area.get_text().strip().encode("utf-8"))
        # print([area_result])
        area_result=cleanup(result).split('>')[2].split(";")[0]
        nearby_result=cleanup(result).split('>')[2].split(";")[1]
        # nearby_result=cleanup(area_result).split('>')
        print "Area : ",area_result
        print "Nearby: ",nearby_result

        # print "Nearby : ",nearby_result

    except StandardError as e:
        area_result="Error was {0}".format(e)
        print area_result

def cleanup(s, remove=('\n', '\t')):
    newString = ''
    for c in s:
        # Remove special characters defined above.
        # Then we remove anything that is not printable (for instance \xe2)
        # Finally we remove duplicates within the string matching certain characters.
        if c in remove: continue
        elif not c in string.printable: continue
        elif len(newString) > 0 and c == newString[-1] and c in ('\n', ' ', ',', '.'): continue
        newString += c
    return newString

我尝试网络浏览的网站是this。位置信息位于右侧栏。例如UAE ‪>‪ Dubai ‪>‪ Jumeirah Village ‪>‪ Jumeirah Village Circle ; 3.2 km from Dubai Autodrome

我得到的错误是:

- Error was Index out of range 

有谁能告诉我如何解决这个错误,请看我的代码?

请注意,并非所有类似页面都会出现此错误。

更新:尝试mu的解决方案并立即收到此错误

Error was 'list' object has no attribute 'split'

1 个答案:

答案 0 :(得分:1)

问题在于这两行,你使用第三个元素(使用index [2]),无论它是否存在:

area_result=cleanup(result).split('>')[2].split(";")[0]
nearby_result=cleanup(result).split('>')[2].split(";")[1]

相反,您可以执行以下操作

cleanedup = cleanup(result).split('>')
if len(cleanedup) >= 3:
    results = cleanedup[2].split(";")
    if len(results) >= 2:
        area_result, nearby_result = results[0], results[1]