Question

我想用multindex平衡存储在熊猫数据框中的某些国家的某些生产流程。

我的问题的简化示例可能是这样的

dict_df1={2016: {('country A', 'peanuts', 'supply'): 3.0,
        ('country A', 'peanuts', 'demand'): 2.0,
        ('country A', 'olives', 'supply'): 1.0,
        ('country A', 'olives', 'demand'): 0.5,
        ('Country B', 'peanuts', 'supply'): 3.0,
        ('Country B', 'peanuts', 'demand'): 2.0,
        ('Country B', 'olives', 'supply'): 1.0,
        ('Country B', 'olives', 'demand'): 0.5},
 2017: {('country A', 'peanuts', 'supply'): 4,
       ('country A', 'peanuts', 'demand'): 3,
       ('country A', 'olives', 'supply'): 2,
       ('country A', 'olives', 'demand'): 2,
       ('Country B', 'peanuts', 'supply'): 4,
       ('Country B', 'peanuts', 'demand'): 3,
       ('Country B', 'olives', 'supply'): 2,
       ('Country B', 'olives', 'demand'): 2}}

pd.DataFrame(dict_df1)

我想在第三级添加一些行，以供需之间的差异为准。结果应该是这样的：

我尝试做

s=df1.loc[(slice(None),slice(None),'supply'),:]
s.index=s.index.droplevel('category')

d=df1.loc[(slice(None),slice(None),'demand'),:]
d.index=d.index.droplevel('category')

b=s-d

df1.loc[(slice(None),slice(None),'difference'),:]=b

但是我收到键盘错误消息。我想我需要在分配任何值之前以某种方式在multiindex中声明新条目，但是我不知道该怎么做。真实的数据集在多索引中具有许多国家，地区甚至更多的层次，因此我正在寻找一种通用的解决方案。

Answer 1

使用groupby diff创建要添加的df，然后我们使用concat

conbinedf=df.groupby(level=[0,1]).diff().dropna().reset_index(level=2).assign(level_2='diff').set_index('level_2',append=True)
yourdf=pd.concat([df,conbinedf]).sort_index(level=[0,1])
yourdf
Out[287]: 
                          2016  2017
Country B olives  demand   0.5   2.0
                  diff     0.5   0.0
                  supply   1.0   2.0
          peanuts demand   2.0   3.0
                  diff     1.0   1.0
                  supply   3.0   4.0
country A olives  demand   0.5   2.0
                  diff     0.5   0.0
                  supply   1.0   2.0
          peanuts demand   2.0   3.0
                  diff     1.0   1.0
                  supply   3.0   4.0

向pandas multindex添加行

1 个答案: