加入两个数据框:
left_dict = {
'id1': [1,2,3,4,5],
'val1': [10,20,30,40,50],
'lft': ['a','b','c','d','e']
}
right_dict = {
'id1': [1,7,3,4,8,1,3],
'val2': [100,700,300,400,800,110,330],
'rgt': [1.1,2.2,3.3,4.4,5.5,6.6,7.7]
}
left = pd.DataFrame(left_dict)
right = pd.DataFrame(right_dict)
r = pd.merge(left, right, how='outer', on='id1', indicator=False)
我得到了结果数据框:
id1 lft val1 rgt val2
0 1.0 a 10.0 1.1 100.0
1 1.0 a 10.0 6.6 110.0
2 2.0 b 20.0 NaN NaN
3 3.0 c 30.0 3.3 300.0
4 3.0 c 30.0 7.7 330.0
5 4.0 d 40.0 4.4 400.0
6 5.0 e 50.0 NaN NaN
7 7.0 NaN NaN 2.2 700.0
8 8.0 NaN NaN 5.5 800.0
现在我需要折叠具有相同' id1',' lft'和' rgt'使用' id1',' lft',' rgt'连接到一行,并添加新列' xxx'到这个数据框架。此列中的值' xxx'用函数
计算def f(val1, val2):
if math.isnan(val2):
r = val1
else:
if math.isnan(val1):
r = val2
else:
r = val1 * 2 + val2 * 3
return r
因此产生的数据框应为:
id1 lft val1 rgt val2 xxx
0 1.0 a 10.0 1.1 100.0 320.0
1 2.0 b 20.0 NaN NaN 20.0
2 3.0 c 30.0 3.3 300.0 960.0
3 4.0 d 40.0 4.4 400.0 40.0
4 5.0 e 50.0 NaN NaN 50.0
5 7.0 NaN NaN 2.2 700.0 700.0
6 8.0 NaN NaN 5.5 800.0 800.0
我试图使用:
In [85]: r.groupby(['id1','val1', 'lft', 'rgt']).groups
这将返回一个字典,其值等于组中的行号,这根本没有帮助。任何想法如何实际折叠和减少行?
答案 0 :(得分:0)
r['xxx'] = [f(x[1]['val1'],x[1]['val2']) for x in r.iterrows()]
可能会有效,但请记住,对于重复的组合,您将获得重复的行,这是您正在寻找的逻辑吗?