我有以下代码从cia网站上抓取数据:
import json
from bs4 import BeautifulSoup
import html
from urllib.request import urlopen
from functools import reduce
import pandas as pd
import requests
countries = ['af', 'ax']
def get_data(a):
url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/'+a+'.html'
page = urlopen(url)
soup = BeautifulSoup(page,'html.parser')
# geography
try:
country = soup.find('span', {'class' : 'region'}).text
map_reference = soup.find('div', {'id' : 'field-map-references'}).get_text(strip=True)
return country, map_reference
except(AttributeError) as e:
print(e)
results = pd.DataFrame([get_data(p) for p in countries])
results
哪个会产生:
0 1
0 Afghanistan Asia
1 Akrotiri Middle East
但是现在当我尝试将另一个值mean_elevation
添加到同一代码中时:
countries = ['af', 'ax']
def get_data(a):
url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/'+a+'.html'
page = urlopen(url)
soup = BeautifulSoup(page,'html.parser')
# geography
try:
country = soup.find('span', {'class' : 'region'}).text
map_reference = soup.find('div', {'id' : 'field-map-references'}).get_text(strip=True)
mean_elevation = soup.find('div', {'id' : 'field-elevation'}).find_next('span').find_next('span').get_text(strip=True)
return country, map_reference, mean_elevation
except(AttributeError) as e:
print(e)
results = pd.DataFrame([get_data(p) for p in countries])
results
我得到:
0 1 2
0 Afghanistan Asia 1,884 m
1 None None None
我知道这是因为第二个国家“ ax”没有该字段,但是为什么整行都变成“无”?我该怎么办才能解决它,并显示可用数据和不可用数据为空白?
所需结果:
0 1 2
0 Afghanistan Asia 1,884 m
1 Akrotiri Middle East None
答案 0 :(得分:1)
尝试此更改,
# fix : 'NoneType' object has no attribute 'find_next'
elevation = soup.find('div', {'id': 'field-elevation'})
if elevation:
mean_elevation = elevation.find_next('span').find_next('span').get_text(strip=True)
else:
mean_elevation = ""
0 1 2
0 Afghanistan Asia 1,884 m
1 Akrotiri Middle East