我有一个包含3列的数据框。用户名,电子邮件和成员身份。我想按会员分组。
输入数据框如下:
setContentView(provideLayout())
我想要的输出是:
abc1 abc@company.com prod1
abc1 abc@company.com prod2
abc1 abc@company.com prod3
def1 def@company.com prod2
def1 def@company.com prod3
xyz1 xyz@company.com prod1
xyz1 xyz@company.com prod3
xyz1 xyz@company.com prod4
我尝试过这种方法,但是如果不使用聚合函数似乎无法解决
以下是代码段:
abc1 abc@company.com prod1
prod2
prod3
def1 def@company.com prod2
prod3
xyz1 xyz@company.com prod1
prod3
prod4
这就是我得到的:
df = pd.DataFrame(data['Members'])
dn_group = df.groupby(['username','email'])
new_df = dn_group['membership'].agg('value_counts')
print(new_df)
基本上,我不想获得带有计数的最后一列。
`
答案 0 :(得分:0)
该解决方案如何:
import pandas as pd
df = pd.DataFrame({
"username": ['abc1','abc1','abc1','def1','def1','xyz1','xyz1','xyz1'],
"email":['abc@company.com','abc@company.com','abc@company.com','def@company.com','def@company.com','xyz@company.com','xyz@company.com','xyz@company.com'],
'membership':['prod1','prod2','prod3','prod2','prod3','prod1','prod3','prod4'] })
df.groupby(['username','email'], as_index=False).agg(lambda x: set(x))
结果:
username email membership
0 abc1 abc@company.com {prod2, prod3, prod1}
1 def1 def@company.com {prod2, prod3}
2 xyz1 xyz@company.com {prod4, prod3, prod1}