我正在尝试使用BEA的API查询收入数据。 API说明-https://apps.bea.gov/api/_pdf/bea_web_service_api_user_guide.pdf
我的目标是解析生成的XML,并将其转换为数据框,其中包含不同年份的列。
我遇到的问题是我解析数据的方式是“融化”格式,在这里我想要年份的各个列,而在这些列的每个列中都需要这些年份的收入数据。 / p>
我该如何完成?下面是我正在使用的代码。它要求您通过电子邮件注册一个API密钥,然后在下面的URL中的“ UserID”之后输入它。
bea_income = 'https://apps.bea.gov/api/data/?UserID=ENTERYOURAPIKEY&method=GetData&'\
'datasetname=RegionalIncome&TableName=RPI2&LineCode=2&Year=2014,2015,2016&GeoFips=MSA&ResultFormat=xml'
bea_inc_request = requests.get(bea_income, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'})
bea_inc_html = bea_inc_request.content
bea_inc_soup = BeautifulSoup(bea_inc_html, 'xml')
MSA = []
TimePeriod = []
Income = []
GeoFips = []
for i in range(len(bea_inc_soup.Results.find_all('Data'))):
MSA.append(bea_inc_soup.Results.find_all('Data')[i]['GeoName'])
GeoFips.append(bea_inc_soup.Results.find_all('Data')[i]['GeoFips'])
Income.append(bea_inc_soup.Results.find_all('Data')[i]['DataValue'])
TimePeriod.append(bea_inc_soup.Results.find_all('Data')[i]['TimePeriod'])
income_data = pd.DataFrame({'MSA':MSA, 'FIPS':GeoFips, 'Year':TimePeriod, 'Income':Income})
MSA FIPS Year Income
0 Abilene, TX (Metropolitan Statistical Area) 10180 2014 41818
1 Abilene, TX (Metropolitan Statistical Area) 10180 2015 41651
2 Abilene, TX (Metropolitan Statistical Area) 10180 2016 40409
3 Akron, OH (Metropolitan Statistical Area) 10420 2016 45448
4 Akron, OH (Metropolitan Statistical Area) 10420 2015 45298
答案 0 :(得分:0)
为了使数据脱离“熔化”格式,我根据Year
和Income
列进行了透视。
income_pivot = income_data[['Year','Income']].pivot(columns='Year')['Income']
Year 2014 2015 2016
0 41,818 NaN NaN
1 NaN 41,651 NaN
2 NaN NaN 40,409
3 44,097 NaN NaN
4 NaN 45,298 NaN
5 NaN NaN 45,448
然后,我手动删除从数据中心创建的NaN,以便在各自的列中按年获取每个MSA的收入。
income_pivot_2014 = income_pivot.iloc[:,0].dropna().values
income_pivot_2015 = income_pivot.iloc[:,1].dropna().values
income_pivot_2016 = income_pivot.iloc[:,2].dropna().values
添加了MSA的名称
income_pivot_msa = income_data['MSA'].unique()
并将所有内容合并到一个数据框中。
income_data_form = pd.DataFrame({'MSA':income_pivot_msa,
'2014_inc':income_pivot_2014,
'2015_inc':income_pivot_2015,
'2016_inc':income_pivot_2016,
'FIPS':income_data['FIPS'].unique()})