我有一个CSV文件,如下所示:
State,Ownership Sector,Coal,Gas,Diesel,Total Thermal,Nuclear,Hydro,RES,Grand Total
Delhi,State,135,1550.4,0,1685.4,0,0,0,1685.4
,Private,0,108,0,108,0,0,18.53,126.53
,Central,4355.41,207.61,0,4563.02,122.08,666.12,0,5351.22
,Sub-Total,4490.41,1866.01,0,6356.42,122.08,666.12,18.53,7163.15
Haryana,State,3160,25,3.92,3188.92,0,884.51,70.1,4143.53
,Private,1620,0,0,1620,0,0,53.1,1673.1
,Central,1174,535.29,0,1709.29,109.16,478.67,0,2297.12
,Sub-Total,5954,560.29,3.92,6518.21,109.16,1363.18,123.2,8113.75
前两列应该作为 MultiIndex 读取,因此前四行的第一个索引列应该是 Delhi ,依此类推。但是,当我使用read_csv
阅读此内容时,我会得到以下内容:
In [33]: pd.read_csv("data/econ/electricity_2012-13.csv", index_col=[0,1], skipinitialspace=True).ix[:8]
Out[33]:
Coal Gas Diesel Total Thermal Nuclear \
State Ownership Sector
Delhi State 135.00 1550.40 0.00 1685.40 0.00
NaN Private 0.00 108.00 0.00 108.00 0.00
Central 4355.41 207.61 0.00 4563.02 122.08
Sub-Total 4490.41 1866.01 0.00 6356.42 122.08
Haryana State 3160.00 25.00 3.92 3188.92 0.00
NaN Private 1620.00 0.00 0.00 1620.00 0.00
Central 1174.00 535.29 0.00 1709.29 109.16
Sub-Total 5954.00 560.29 3.92 6518.21 109.16
Hydro RES Grand Total
State Ownership Sector
Delhi State 0.00 0.00 1685.40
NaN Private 0.00 18.53 126.53
Central 666.12 0.00 5351.22
Sub-Total 666.12 18.53 7163.15
Haryana State 884.51 70.10 4143.53
NaN Private 0.00 53.10 1673.10
Central 478.67 0.00 2297.12
Sub-Total 1363.18 123.20 8113.75
我不明白 NaN 的来源;他们弄乱了我的 MultiIndex 结构。
非常感谢任何理解和解决此问题的帮助=)
答案 0 :(得分:0)
要在索引中删除NaN,您有两种选择:
因此,对于第二个选项,您可以:
import pandas as pd
df = pd.read_csv("data/econ/electricity_2012-13.csv", skipinitialspace=True) \
.fillna(method='ffill').set_index(['State', 'Ownership Sector'])
fillna
是替换DataFrame或Series中的NaN
值的方法。您可以指定值,但在这种情况下,我们将其告诉“ffill”(正向)填充方法。