我想将beautifulsoup的输出安排到pandas数据帧。
import pandas as pd
import requests
import bs4
import urllib, json
Cik = '824142'
url = 'https://api.ustals.com/v1/indicators/xbrl?indicators=EarningsPerShareDiluted,NetIncomeLoss'\
',Revenues,ProfitLoss,DividendsCommonStockCash,Assets,Liabilities'\
'&frequency=q&period_type=end_date&companies={s}&token=KUNwBJE78kDQMUfoC3g'
response = requests.get(url.format(s=Cik))
page_data = bs4.BeautifulSoup(response.text, "html.parser")
print page_data
页面数据的输出
company_id,indicator_id,2011-07-30,2011-10-29,2012-04-28,2012-07-28,2012-10-27,2013-05-04,2013-08-03,2013-11-
02,2014-02-01,2014-05-03,2014-11-01,2015-05-02,2015-08-01,2015-10-31,2016-01-30,2016-04-30,2016-07-30,2016-10-29,2017-01-28,2017-04-29,2017-07-29,2017-10-28
1318008,Assets,343367000,357805000,378926000,418145000,438136000,416984000,450963000,465777000,443403000,454455000,499572000,505547000,457355000,441070000,414695000,422148000,432561000,453028000,426683000,447436000,468867000,496269000
1318008,EarningsPerShareDiluted,0.08,0.45,0.14,0.07,0.4,0.08,0.16,0.39,0.89,0.09,0.54,0.09,0.11,0.36,0.48,-0.
08,-0.03,0.43,0.72,-0.18,-0.02,0.48
1318008,Liabilities,106880000,106092000,98507000,135708000,137777000,115743000,141548000,140583000,107749000,
130316000,155372000,141121000,152237000,141540000,117738000,132848000,152314000,163597000,119632000,141867000
,154362000,169686000 1318008,NetIncomeLoss,2591000,14137000,4527000,2086000,12667000,2498000,4739000,11860000,26851000,2496000,157
27000,2770000,3213000,9653000,13149000,-2137000,-838000,10695000,18184000,-4448000,-608000,11922000
如何将其安排到整洁的熊猫数据框?日期为一个数据帧,资产为一个数据帧,负债为一个数据帧,等等。
答案 0 :(得分:1)
我认为您需要在评论中提及@MaxU之类的解决方案,但第一和第二列设置为hibernate.jdbc.batch_size 20
:
MultiIndex
也可以进行小数据清理 - 从第二列创建索引,先删除重复并转置:
df = pd.read_csv(url.format(s=Cik), index_col=[0,1])
print (df)
2011-06-30 2011-09-30 2012-03-31 \
company_id indicator_id
824142 Assets 1.863600e+08 1.822540e+08 1.847650e+08
DividendsCommonStockCash NaN NaN NaN
EarningsPerShareDiluted 1.500000e-01 2.300000e-01 1.800000e-01
NetIncomeLoss 3.839000e+06 5.626000e+06 4.567000e+06
2012-06-30 2012-09-30 2012-12-31 \
company_id indicator_id
824142 Assets 2.035540e+08 1.962540e+08 1.934930e+08
DividendsCommonStockCash NaN NaN NaN
EarningsPerShareDiluted 3.800000e-01 2.400000e-01 3.100000e-01
NetIncomeLoss 9.297000e+06 6.007000e+06 7.578000e+06
2013-03-31 2013-06-30 2013-09-30 \
company_id indicator_id
824142 Assets 1.944730e+08 2.212140e+08 2.201380e+08
DividendsCommonStockCash NaN NaN NaN
EarningsPerShareDiluted 2.900000e-01 3.300000e-01 2.800000e-01
NetIncomeLoss 7.140000e+06 1.211900e+07 1.052200e+07
2013-12-31 ... 2015-06-30 \
company_id indicator_id ...
824142 Assets 2.154440e+08 ... 250012000.0
DividendsCommonStockCash NaN ... NaN
EarningsPerShareDiluted 1.100000e-01 ... 0.2
NetIncomeLoss 7.766000e+06 ... 11130000.0
2015-09-30 2015-12-31 2016-03-31 \
company_id indicator_id
824142 Assets 2.550980e+08 2.328540e+08 236669000.0
DividendsCommonStockCash NaN NaN 0.0
EarningsPerShareDiluted 2.400000e-01 2.500000e-01 0.2
NetIncomeLoss 1.325100e+07 1.294800e+07 10806000.0
2016-06-30 2016-09-30 2016-12-31 \
company_id indicator_id
824142 Assets 2.575270e+08 2.572770e+08 2.565300e+08
DividendsCommonStockCash NaN NaN NaN
EarningsPerShareDiluted 2.700000e-01 2.900000e-01 2.400000e-01
NetIncomeLoss 1.434100e+07 1.568200e+07 1.254700e+07
2017-03-31 2017-06-30 2017-09-30
company_id indicator_id
824142 Assets 2.652830e+08 2.850110e+08 3.031380e+08
DividendsCommonStockCash NaN NaN NaN
EarningsPerShareDiluted 1.900000e-01 2.600000e-01 2.800000e-01
NetIncomeLoss 1.021700e+07 1.379400e+07 1.471700e+07
[4 rows x 25 columns]