解析xlsx文件(世界能源的BP统计评估)

时间:2016-12-27 14:00:31

标签: python parsing pandas dataframe

您好,我想解析xlsx文件,我接下来 -

import pandas as pd
from pandas import DataFrame, read_csv

path = 'bp-statistical-review-of-world-energy-2015-workbook.xlsx'
xls = pd.ExcelFile(path)
df = pd.read_excel(xls, 'Oil Production – Tonnes', index_col=0, na_values=['NA'])

df.index.name = None
#df.drop([0], axis=0, inplace=True)
#df.drop((['Change']), axis=1, inplace=True)
df.drop(df.columns[[50, 51]], axis=1, inplace=True)
df.drop(df.index[[0, 77, 78, 79, 80, 81, 82, 83]], axis=0, inplace=True)

My result

收到以下缺点,日期是第二行但未全部正确显示,某些日期有视图 - 2015.00000。另外,我无法移动上面日期的行。请帮帮我)

Data

1 个答案:

答案 0 :(得分:0)

这可能是你想要的吗?

>>> import pandas as pd
>>> path = 'bp-statistical-review-of-world-energy-2015-workbook.xlsx'
>>> df = pd.read_excel(xls,'Oil Production – Tonnes',skiprows=2)
>>> df.head()
        Million tonnes     1965     1966     1967     1968     1969     1970  \
0                  NaN      NaN      NaN      NaN      NaN      NaN      NaN   
1                   US  427.694  454.539  484.222   502.88  511.352   533.49   
2               Canada  43.8742  48.2122  52.7011  57.1193   62.218  70.0679   
3               Mexico  18.0539  18.4895  20.4638  21.9007   22.965   24.179   
4  Total North America  489.623  521.241  557.387    581.9  596.535  627.737   

      1971     1972     1973    ...           2007        2008        2009  \
0      NaN      NaN      NaN    ...            NaN         NaN         NaN   
1  525.888  527.888  514.652    ...     305.153524  302.254906  322.267908   
2  75.1638  86.7131  100.315    ...     155.286457  152.875890  152.805930   
3  24.1073  25.0976  25.8594    ...     172.231281  156.896182  146.664163   
4  625.159  639.699  640.826    ...     632.671262  612.026978  621.738001   

         2010        2011        2012        2013        2014     2013.1  \
0         NaN         NaN         NaN         NaN         NaN        NaN   
1  333.128080  345.352788  394.732788  448.494835  519.944404    0.15931   
2  160.293484  169.801471  182.586206  194.379612  209.800775  0.0793353   
3  145.600519  144.518511  143.857291  141.845640  137.097698 -0.0334726   
4  639.022083  659.672770  721.176285  784.720088  866.842877   0.104652   

   of total  
0       NaN  
1  0.123193  
2  0.049709  
3  0.032483  
4  0.205386  

[5 rows x 53 columns]