我有一个如下数据框:
df = pd.DataFrame({'condition' : ['a','b','b','b','a','a'],
'name' : ['one', 'one', 'two', 'three', 'three', 'three'],
'data1' : [7, 3, 48, 13, 27, 12]})
df
condtion data1 name
0 a 7 one
1 b 3 one
2 b 48 two
3 b 13 three
4 a 27 three
5 a 12 three
对于每个名称,我想在data1
上加总,并使用condition=a
的信息(如果我拥有该信息,则使用condition=b
)。最后,我想要一个类似以下的数据框:
df1
name total
0 one 7
1 two 48
2 three 39
答案 0 :(得分:4)
您可以将groupby
与聚合sum
进行聚合,并通过unstack
进行整形,最后用fillna
替换丢失的类别a
的NaN:
df = df.groupby(['name','condition'], sort=False)['data1'].sum().unstack()
df['total'] = df['a'].fillna(df['b'])
print (df)
condition a b total
name
one 7.0 3.0 7.0
two NaN 48.0 48.0
three 39.0 13.0 39.0
对于新的DataFrame
:
df1 = df.reset_index().rename_axis(None, 1)[['name','total']]
print (df1)
name total
0 one 7.0
1 two 48.0
2 three 39.0
使用apply
的另一种解决方案:
def f(x):
if (x['condition'] == 'a').any():
return x.loc[x['condition'] == 'a', 'data1'].sum()
else:
return x.loc[x['condition'] == 'b', 'data1'].sum()
df1 = df.groupby('name', sort=False).apply(f).reset_index(name='total')
print (df1)
name total
0 one 7
1 two 48
2 three 39
更好的方法是创建系列,以聚合经过过滤的DataFrame,然后聚合combine_first
,但是此解决方案会忽略所有name
个没有a
或b
条件的组:
a = df.loc[df['condition'] == 'a'].groupby('name', sort=False)['data1'].sum()
b = df.loc[df['condition'] == 'b'].groupby('name', sort=False)['data1'].sum()
df = a.combine_first(b).reset_index(name='total')
print (df)
name total
0 one 7.0
1 three 39.0
2 two 48.0
答案 1 :(得分:0)
答案 2 :(得分:0)
您可以将pd.pivot_table
与aggfunc='sum'
一起使用:
df = pd.DataFrame({'condition' : ['a','b','b','b','a','a'],
'name' : ['one', 'one', 'two', 'three', 'three', 'three'],
'data1' : [7, 3, 48, 13, 27, 12]})
res = df.pivot_table(index='name', columns='condition', values='data1', aggfunc='sum')
condition a b
name
one 7.0 3.0
three 39.0 13.0
two NaN 48.0
然后应用fillna
并清理:
res = res.assign(total=res['a'].fillna(res['b']).astype(int))\
.reset_index().rename_axis('', 1)\
.loc[:, ['name', 'total']]
print(res)
name total
0 one 7
1 three 39
2 two 48