Python Pandas Dataframe:使用自定义函数对行和reduce组进行分组

时间:2016-12-18 18:41:37

标签: python pandas dataframe reduce fold

加入两个数据框:

left_dict = {                                                                                  
    'id1': [1,2,3,4,5],                                                                        
    'val1': [10,20,30,40,50],                                                                  
    'lft': ['a','b','c','d','e']                                                               
}                                                                                              

right_dict = {                                                                                 
    'id1': [1,7,3,4,8,1,3],                                                                    
    'val2': [100,700,300,400,800,110,330],                                                     
    'rgt': [1.1,2.2,3.3,4.4,5.5,6.6,7.7]                                                       
}                                                                                              

left = pd.DataFrame(left_dict)                                                                 
right = pd.DataFrame(right_dict)                                                               

r = pd.merge(left, right, how='outer', on='id1', indicator=False)

我得到了结果数据框:

   id1  lft  val1  rgt   val2                                                                  
0  1.0    a  10.0  1.1  100.0                                                                  
1  1.0    a  10.0  6.6  110.0                                                                  
2  2.0    b  20.0  NaN    NaN                                                                  
3  3.0    c  30.0  3.3  300.0                                                                  
4  3.0    c  30.0  7.7  330.0                                                                  
5  4.0    d  40.0  4.4  400.0                                                                  
6  5.0    e  50.0  NaN    NaN                                                                  
7  7.0  NaN   NaN  2.2  700.0                                                                  
8  8.0  NaN   NaN  5.5  800.0                                                                  

现在我需要折叠具有相同' id1',' lft'和' rgt'使用' id1',' lft',' rgt'连接到一行,并添加新列' xxx'到这个数据框架。此列中的值' xxx'用函数

计算
def f(val1, val2):
    if math.isnan(val2):
        r = val1
    else:
        if math.isnan(val1):
            r = val2
    else:
        r = val1 * 2 + val2 * 3
    return r

因此产生的数据框应为:

   id1  lft  val1  rgt   val2 xxx                                                              
0  1.0    a  10.0  1.1  100.0 320.0                                                            
1  2.0    b  20.0  NaN    NaN 20.0                                                             
2  3.0    c  30.0  3.3  300.0 960.0                                                            
3  4.0    d  40.0  4.4  400.0 40.0                                                             
4  5.0    e  50.0  NaN    NaN 50.0                                                             
5  7.0  NaN   NaN  2.2  700.0 700.0                                                            
6  8.0  NaN   NaN  5.5  800.0 800.0                                                            

我试图使用:

In [85]: r.groupby(['id1','val1', 'lft', 'rgt']).groups

这将返回一个字典,其值等于组中的行号,这根本没有帮助。任何想法如何实际折叠和减少行?

1 个答案:

答案 0 :(得分:0)

r['xxx'] = [f(x[1]['val1'],x[1]['val2']) for x in r.iterrows()]

可能会有效,但请记住,对于重复的组合,您将获得重复的行,这是您正在寻找的逻辑吗?