使用/ MultiIndex读取CSV文件

时间:2014-07-01 21:10:05

标签: csv pandas

我有一个CSV文件,如下所示:

State,Ownership Sector,Coal,Gas,Diesel,Total Thermal,Nuclear,Hydro,RES,Grand Total
Delhi,State,135,1550.4,0,1685.4,0,0,0,1685.4
,Private,0,108,0,108,0,0,18.53,126.53
,Central,4355.41,207.61,0,4563.02,122.08,666.12,0,5351.22
,Sub-Total,4490.41,1866.01,0,6356.42,122.08,666.12,18.53,7163.15
Haryana,State,3160,25,3.92,3188.92,0,884.51,70.1,4143.53
,Private,1620,0,0,1620,0,0,53.1,1673.1
,Central,1174,535.29,0,1709.29,109.16,478.67,0,2297.12
,Sub-Total,5954,560.29,3.92,6518.21,109.16,1363.18,123.2,8113.75

前两列应该作为 MultiIndex 读取,因此前四行的第一个索引列应该是 Delhi ,依此类推。但是,当我使用read_csv阅读此内容时,我会得到以下内容:

In [33]: pd.read_csv("data/econ/electricity_2012-13.csv", index_col=[0,1], skipinitialspace=True).ix[:8]
Out[33]: 
                             Coal      Gas  Diesel  Total Thermal  Nuclear  \
State   Ownership Sector                                                     
Delhi   State              135.00  1550.40    0.00        1685.40     0.00   
NaN     Private              0.00   108.00    0.00         108.00     0.00   
        Central           4355.41   207.61    0.00        4563.02   122.08   
        Sub-Total         4490.41  1866.01    0.00        6356.42   122.08   
Haryana State             3160.00    25.00    3.92        3188.92     0.00   
NaN     Private           1620.00     0.00    0.00        1620.00     0.00   
        Central           1174.00   535.29    0.00        1709.29   109.16   
        Sub-Total         5954.00   560.29    3.92        6518.21   109.16   

                            Hydro     RES  Grand Total  
State   Ownership Sector                                
Delhi   State                0.00    0.00      1685.40  
NaN     Private              0.00   18.53       126.53  
        Central            666.12    0.00      5351.22  
        Sub-Total          666.12   18.53      7163.15  
Haryana State              884.51   70.10      4143.53  
NaN     Private              0.00   53.10      1673.10  
        Central            478.67    0.00      2297.12  
        Sub-Total         1363.18  123.20      8113.75  

我不明白 NaN 的来源;他们弄乱了我的 MultiIndex 结构。

非常感谢任何理解和解决此问题的帮助=)

1 个答案:

答案 0 :(得分:0)

要在索引中删除NaN,您有两种选择:

  1. 编辑源文件(可能不可行)
  2. 使用pandas
  3. 填写

    因此,对于第二个选项,您可以:

    import pandas as pd
    df = pd.read_csv("data/econ/electricity_2012-13.csv", skipinitialspace=True) \
           .fillna(method='ffill').set_index(['State', 'Ownership Sector'])
    

    fillna是替换DataFrame或Series中的NaN值的方法。您可以指定值,但在这种情况下,我们将其告诉“ffill”(正向)填充方法。