我使用python程序来截取特定页面。我正在使用的代码就是这个。
#Area
try:
area= soup.find('div', 'location')
result= str(area.get_text().strip().encode("utf-8"))
# print([area_result])
area_result=cleanup(result).split('>')[2].split(";")[0]
nearby_result=cleanup(result).split('>')[2].split(";")[1]
# nearby_result=cleanup(area_result).split('>')
print "Area : ",area_result
print "Nearby: ",nearby_result
# print "Nearby : ",nearby_result
except StandardError as e:
area_result="Error was {0}".format(e)
print area_result
def cleanup(s, remove=('\n', '\t')):
newString = ''
for c in s:
# Remove special characters defined above.
# Then we remove anything that is not printable (for instance \xe2)
# Finally we remove duplicates within the string matching certain characters.
if c in remove: continue
elif not c in string.printable: continue
elif len(newString) > 0 and c == newString[-1] and c in ('\n', ' ', ',', '.'): continue
newString += c
return newString
我尝试网络浏览的网站是this。位置信息位于右侧栏。例如UAE > Dubai > Jumeirah Village > Jumeirah Village Circle ; 3.2 km from Dubai Autodrome
我得到的错误是:
- Error was Index out of range
有谁能告诉我如何解决这个错误,请看我的代码?
请注意,并非所有类似页面都会出现此错误。
更新:尝试mu的解决方案并立即收到此错误
Error was 'list' object has no attribute 'split'
答案 0 :(得分:1)
问题在于这两行,你使用第三个元素(使用index [2]
),无论它是否存在:
area_result=cleanup(result).split('>')[2].split(";")[0]
nearby_result=cleanup(result).split('>')[2].split(";")[1]
相反,您可以执行以下操作
cleanedup = cleanup(result).split('>')
if len(cleanedup) >= 3:
results = cleanedup[2].split(";")
if len(results) >= 2:
area_result, nearby_result = results[0], results[1]