您好,我想解析xlsx文件,我接下来 -
import pandas as pd
from pandas import DataFrame, read_csv
path = 'bp-statistical-review-of-world-energy-2015-workbook.xlsx'
xls = pd.ExcelFile(path)
df = pd.read_excel(xls, 'Oil Production – Tonnes', index_col=0, na_values=['NA'])
df.index.name = None
#df.drop([0], axis=0, inplace=True)
#df.drop((['Change']), axis=1, inplace=True)
df.drop(df.columns[[50, 51]], axis=1, inplace=True)
df.drop(df.index[[0, 77, 78, 79, 80, 81, 82, 83]], axis=0, inplace=True)
收到以下缺点,日期是第二行但未全部正确显示,某些日期有视图 - 2015.00000。另外,我无法移动上面日期的行。请帮帮我)
答案 0 :(得分:0)
这可能是你想要的吗?
>>> import pandas as pd
>>> path = 'bp-statistical-review-of-world-energy-2015-workbook.xlsx'
>>> df = pd.read_excel(xls,'Oil Production – Tonnes',skiprows=2)
>>> df.head()
Million tonnes 1965 1966 1967 1968 1969 1970 \
0 NaN NaN NaN NaN NaN NaN NaN
1 US 427.694 454.539 484.222 502.88 511.352 533.49
2 Canada 43.8742 48.2122 52.7011 57.1193 62.218 70.0679
3 Mexico 18.0539 18.4895 20.4638 21.9007 22.965 24.179
4 Total North America 489.623 521.241 557.387 581.9 596.535 627.737
1971 1972 1973 ... 2007 2008 2009 \
0 NaN NaN NaN ... NaN NaN NaN
1 525.888 527.888 514.652 ... 305.153524 302.254906 322.267908
2 75.1638 86.7131 100.315 ... 155.286457 152.875890 152.805930
3 24.1073 25.0976 25.8594 ... 172.231281 156.896182 146.664163
4 625.159 639.699 640.826 ... 632.671262 612.026978 621.738001
2010 2011 2012 2013 2014 2013.1 \
0 NaN NaN NaN NaN NaN NaN
1 333.128080 345.352788 394.732788 448.494835 519.944404 0.15931
2 160.293484 169.801471 182.586206 194.379612 209.800775 0.0793353
3 145.600519 144.518511 143.857291 141.845640 137.097698 -0.0334726
4 639.022083 659.672770 721.176285 784.720088 866.842877 0.104652
of total
0 NaN
1 0.123193
2 0.049709
3 0.032483
4 0.205386
[5 rows x 53 columns]