Pandas在新列中横跨轴1的总和和最大值

时间:2017-02-08 14:12:01

标签: python pandas

我的df如下:

  Hour                                      1     2    
    CU0111-012379-H Output Energy, (Wh/h)   2.0  3.0    
                    Lights (Wh)             4.0  5.0  
                    Lights+Media (Wh)       0.0  0.0    
                    Total Usage (h)         0.0  2.0     
                    Lights (h)              0.0  1.0   
                    Light+Media (h)         0.0   0.0    
                    Battery Voltage, (V)   13.5  13.7     
                    Max Watt, W             7.5   4.5      

我最后添加了一个总列:

col_list= list(df)
df['Total'] = df[col_list].sum(axis=1)

  Hour                                      1     2   Total 
    CU0111-012379-H Output Energy, (Wh/h)   2.0  3.0   5.0  
                    Lights (Wh)             4.0  5.0   9.0
                    Lights+Media (Wh)       0.0  0.0   0.0  
                    Total Usage (h)         0.0  2.0   2.0   
                    Lights (h)              0.0  1.0   1.0 
                    Light+Media (h)         0.0   0.0  0.0   
                    Battery Voltage, (V)   13.5  13.7  27.2   
                    Max Watt, W             7.5   4.5  12.0

然而。我希望在总列中得到轴1的最大值,而不是总和:

 Battery Voltage, (V)      
 Max Watt, W

这样df将是:

  Hour                                      1     2   Total/Max 
    CU0111-012379-H Output Energy, (Wh/h)   2.0  3.0   5.0  
                    Lights (Wh)             4.0  5.0   9.0
                    Lights+Media (Wh)       0.0  0.0   0.0  
                    Total Usage (h)         0.0  2.0   2.0   
                    Lights (h)              0.0  1.0   1.0 
                    Light+Media (h)         0.0   0.0  0.0   
                    Battery Voltage, (V)   13.5  13.7  13.7 <-max  
                    Max Watt, W             7.5   4.5  7.5  <-max    
一个恶魔般的初学者尝试可能看起来像这样:

df3['Total/Max'] = d3[col_list].sum(axis=1).df3.groupby(level=1).df3['Battery Voltage, (v)'].transform(max)

1 个答案:

答案 0 :(得分:3)

您可以numpy.whereisinget_level_values一起使用,以检查级别是否包含某些值,然后maxsum

L = ['Battery Voltage, (V)','Max Watt, W']

print (df.index.get_level_values(1).isin(L))
[False False False False False False  True  True]

df['Total/Max'] = np.where(df.index.get_level_values(1).isin(L),
                           df.max(axis=1), 
                           df.sum(axis=1))

print (df)
                                          1     2  Total/Max
Hour                                                        
CU0111-012379-H Output Energy, (Wh/h)   2.0   3.0        5.0
                Lights (Wh)             4.0   5.0        9.0
                Lights+Media (Wh)       0.0   0.0        0.0
                Total Usage (h)         0.0   2.0        2.0
                Lights (h)              0.0   1.0        1.0
                Light+Media (h)         0.0   0.0        0.0
                Battery Voltage, (V)   13.5  13.7       13.7
                Max Watt, W             7.5   4.5        7.5

另一个loc用于mask选择并应用maxsum的解决方案,还需要~来反转boolean array

L = ['Battery Voltage, (V)','Max Watt, W']

mask = df.index.get_level_values(1).isin(L)

df.loc[mask, 'Total/Max'] = df[mask].max(axis=1)
df.loc[~mask, 'Total/Max'] = df[~mask].sum(axis=1)
print (df)
                                          1     2  Total/Max
Hour                                                        
CU0111-012379-H Output Energy, (Wh/h)   2.0   3.0        5.0
                Lights (Wh)             4.0   5.0        9.0
                Lights+Media (Wh)       0.0   0.0        0.0
                Total Usage (h)         0.0   2.0        2.0
                Lights (h)              0.0   1.0        1.0
                Light+Media (h)         0.0   0.0        0.0
                Battery Voltage, (V)   13.5  13.7       13.7
                Max Watt, W             7.5   4.5        7.5

通过评论编辑:需要使用另一个掩码加倍numpy.where

L = ['Battery Voltage, (V)','Max Watt, W']
mask1 = df.index.get_level_values(1).isin(L)
mask2 = df.index.get_level_values(1) == 'Lights (h)'

df['Total/Max/Min'] = np.where(mask1, df.max(axis=1),
                      np.where(mask2, df.min(axis=1), df.sum(axis=1)))

print (df)
                                          1     2  Total/Max/Min
Hour                                                            
CU0111-012379-H Output Energy, (Wh/h)   2.0   3.0            5.0
                Lights (Wh)             4.0   5.0            9.0
                Lights+Media (Wh)       0.0   0.0            0.0
                Total Usage (h)         0.0   2.0            2.0
                Lights (h)              0.0   1.0            0.0
                Light+Media (h)         0.0   0.0            0.0
                Battery Voltage, (V)   13.5  13.7           13.7
                Max Watt, W             7.5   4.5            7.5