抓取网站时BeautifulSoup无法显示所有数据

时间:2020-07-16 05:27:44

标签: python beautifulsoup

我有以下代码从cia网站上抓取数据:

import json
from bs4 import BeautifulSoup
import html
from urllib.request import urlopen
from functools import reduce
import pandas as pd
import requests

countries = ['af', 'ax']

def get_data(a):
    url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/'+a+'.html'
    page = urlopen(url)
    soup = BeautifulSoup(page,'html.parser')
    # geography
    try:
        country = soup.find('span', {'class' : 'region'}).text
        map_reference = soup.find('div', {'id' : 'field-map-references'}).get_text(strip=True)
        return country, map_reference
    except(AttributeError) as e:
        print(e)

results = pd.DataFrame([get_data(p) for p in countries])
results

哪个会产生:


              0           1
0   Afghanistan Asia
1   Akrotiri    Middle East

但是现在当我尝试将另一个值mean_elevation添加到同一代码中时:


countries = ['af', 'ax']

def get_data(a):
    url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/'+a+'.html'
    page = urlopen(url)
    soup = BeautifulSoup(page,'html.parser')
    # geography
    try:
        country = soup.find('span', {'class' : 'region'}).text
        map_reference = soup.find('div', {'id' : 'field-map-references'}).get_text(strip=True)
        mean_elevation = soup.find('div', {'id' : 'field-elevation'}).find_next('span').find_next('span').get_text(strip=True)
        return country, map_reference, mean_elevation
    except(AttributeError) as e:
        print(e)

results = pd.DataFrame([get_data(p) for p in countries])
results

我得到:

              0    1          2
0   Afghanistan Asia    1,884 m
1   None        None    None

我知道这是因为第二个国家“ ax”没有该字段,但是为什么整行都变成“无”?我该怎么办才能解决它,并显示可用数据和不可用数据为空白?

所需结果:

              0           1        2
0   Afghanistan Asia         1,884 m
1   Akrotiri    Middle East  None

1 个答案:

答案 0 :(得分:1)

尝试此更改,

# fix : 'NoneType' object has no attribute 'find_next'

elevation = soup.find('div', {'id': 'field-elevation'})
if elevation:
    mean_elevation = elevation.find_next('span').find_next('span').get_text(strip=True)
else:
    mean_elevation = ""

             0            1        2
0  Afghanistan         Asia  1,884 m
1     Akrotiri  Middle East