我正在尝试从以下网站中抓取数据:https://knowyourcity.info/explore-our-data/
我已将每个数据页的所有URL放入名为urllist的对象中,并编写了此循环:
name = []
year = []
country = []
population = []
taps = []
toiletsToPerson = []
from requests import get
from bs4 import BeautifulSoup
for u in urllist:
response = get(u)
html_soup = BeautifulSoup(response.text, "html.parser")
for u in urllist:
response = get(u)
html_soup = BeautifulSoup(response.text, "html.parser")
headers_containers = html_soup.find('div', class_ = 'settlement-base-status section text-center')
names = headers_containers.h2.text
name.append(names)
year_established = headers_containers.h3.text
year.append(year_established)
headers1_containers = html_soup.find('div', class_ = 'col-xs-12 text-center')
countries = headers1_containers.h4.a.text
country.append(countries)
headers2_containers = html_soup.find('div', class_ = 'bold-it', id = "population")
populations = headers2_containers.text
population.append(populations)
headers3_containers = html_soup.find('div', class_ ='bold-it', id='sharedTaps')
tap = headers3_containers.text
taps.append(tap)
headers4_containers = html_soup.find_all('div', class_ = 'bold-it')
toiletSeat_toPerson = headers4_containers[7].text
toiletsToPerson.append(toiletSeat_toPerson)
当我将这些命令用于单个URL时,它确实起作用,但是当我尝试运行此命令时,出现错误消息:
File "<ipython-input-472-0f7d711bfd3f>", line 5, in <module>
names = headers_containers.h2.text
AttributeError: 'NoneType' object has no attribute 'h2'
为什么会这样?
答案 0 :(得分:0)
您的网址列表未在给定的代码中定义,您确定这是正确的吗? 您也可以使用try +除了处理解析错误
try:
headers_containers = html_soup.find('div', class_ = 'settlement-base-status section text-center')
names = headers_containers.h2.text
name.append(names)
except:
continue