python:从beautifulsoup读取数据并安排在pandas数据帧中

时间:2018-02-15 13:29:38

标签: python pandas dataframe

我想将beautifulsoup的输出安排到pandas数据帧。

import pandas as pd
import requests
import bs4
import urllib, json

Cik = '824142'
url = 'https://api.ustals.com/v1/indicators/xbrl?indicators=EarningsPerShareDiluted,NetIncomeLoss'\
    ',Revenues,ProfitLoss,DividendsCommonStockCash,Assets,Liabilities'\
    '&frequency=q&period_type=end_date&companies={s}&token=KUNwBJE78kDQMUfoC3g'
response = requests.get(url.format(s=Cik))
page_data = bs4.BeautifulSoup(response.text, "html.parser")
print page_data

页面数据的输出

    company_id,indicator_id,2011-07-30,2011-10-29,2012-04-28,2012-07-28,2012-10-27,2013-05-04,2013-08-03,2013-11-
    02,2014-02-01,2014-05-03,2014-11-01,2015-05-02,2015-08-01,2015-10-31,2016-01-30,2016-04-30,2016-07-30,2016-10-29,2017-01-28,2017-04-29,2017-07-29,2017-10-28
    1318008,Assets,343367000,357805000,378926000,418145000,438136000,416984000,450963000,465777000,443403000,454455000,499572000,505547000,457355000,441070000,414695000,422148000,432561000,453028000,426683000,447436000,468867000,496269000
    1318008,EarningsPerShareDiluted,0.08,0.45,0.14,0.07,0.4,0.08,0.16,0.39,0.89,0.09,0.54,0.09,0.11,0.36,0.48,-0.
    08,-0.03,0.43,0.72,-0.18,-0.02,0.48

1318008,Liabilities,106880000,106092000,98507000,135708000,137777000,115743000,141548000,140583000,107749000,
    130316000,155372000,141121000,152237000,141540000,117738000,132848000,152314000,163597000,119632000,141867000
    ,154362000,169686000        1318008,NetIncomeLoss,2591000,14137000,4527000,2086000,12667000,2498000,4739000,11860000,26851000,2496000,157
    27000,2770000,3213000,9653000,13149000,-2137000,-838000,10695000,18184000,-4448000,-608000,11922000

如何将其安排到整洁的熊猫数据框?日期为一个数据帧,资产为一个数据帧,负债为一个数据帧,等等。

1 个答案:

答案 0 :(得分:1)

我认为您需要在评论中提及@MaxU之类的解决方案,但第一和第二列设置为hibernate.jdbc.batch_size 20

MultiIndex

也可以进行小数据清理 - 从第二列创建索引,先删除重复并转置:

df = pd.read_csv(url.format(s=Cik), index_col=[0,1])
print (df)

                                       2011-06-30    2011-09-30    2012-03-31  \
company_id indicator_id                                                         
824142     Assets                    1.863600e+08  1.822540e+08  1.847650e+08   
           DividendsCommonStockCash           NaN           NaN           NaN   
           EarningsPerShareDiluted   1.500000e-01  2.300000e-01  1.800000e-01   
           NetIncomeLoss             3.839000e+06  5.626000e+06  4.567000e+06   

                                       2012-06-30    2012-09-30    2012-12-31  \
company_id indicator_id                                                         
824142     Assets                    2.035540e+08  1.962540e+08  1.934930e+08   
           DividendsCommonStockCash           NaN           NaN           NaN   
           EarningsPerShareDiluted   3.800000e-01  2.400000e-01  3.100000e-01   
           NetIncomeLoss             9.297000e+06  6.007000e+06  7.578000e+06   

                                       2013-03-31    2013-06-30    2013-09-30  \
company_id indicator_id                                                         
824142     Assets                    1.944730e+08  2.212140e+08  2.201380e+08   
           DividendsCommonStockCash           NaN           NaN           NaN   
           EarningsPerShareDiluted   2.900000e-01  3.300000e-01  2.800000e-01   
           NetIncomeLoss             7.140000e+06  1.211900e+07  1.052200e+07   

                                       2013-12-31      ...        2015-06-30  \
company_id indicator_id                                ...                     
824142     Assets                    2.154440e+08      ...       250012000.0   
           DividendsCommonStockCash           NaN      ...               NaN   
           EarningsPerShareDiluted   1.100000e-01      ...               0.2   
           NetIncomeLoss             7.766000e+06      ...        11130000.0   

                                       2015-09-30    2015-12-31   2016-03-31  \
company_id indicator_id                                                        
824142     Assets                    2.550980e+08  2.328540e+08  236669000.0   
           DividendsCommonStockCash           NaN           NaN          0.0   
           EarningsPerShareDiluted   2.400000e-01  2.500000e-01          0.2   
           NetIncomeLoss             1.325100e+07  1.294800e+07   10806000.0   

                                       2016-06-30    2016-09-30    2016-12-31  \
company_id indicator_id                                                         
824142     Assets                    2.575270e+08  2.572770e+08  2.565300e+08   
           DividendsCommonStockCash           NaN           NaN           NaN   
           EarningsPerShareDiluted   2.700000e-01  2.900000e-01  2.400000e-01   
           NetIncomeLoss             1.434100e+07  1.568200e+07  1.254700e+07   

                                       2017-03-31    2017-06-30    2017-09-30  
company_id indicator_id                                                        
824142     Assets                    2.652830e+08  2.850110e+08  3.031380e+08  
           DividendsCommonStockCash           NaN           NaN           NaN  
           EarningsPerShareDiluted   1.900000e-01  2.600000e-01  2.800000e-01  
           NetIncomeLoss             1.021700e+07  1.379400e+07  1.471700e+07  

[4 rows x 25 columns]