Python Pandas对列进行分组而不进行汇总

时间:2020-05-05 22:16:16

标签: python pandas

我有一个包含3列的数据框。用户名,电子邮件和成员身份。我想按会员分组。

输入数据框如下:

setContentView(provideLayout())

我想要的输出是:

abc1 abc@company.com prod1
abc1 abc@company.com prod2
abc1 abc@company.com prod3
def1 def@company.com prod2
def1 def@company.com prod3
xyz1 xyz@company.com prod1
xyz1 xyz@company.com prod3
xyz1 xyz@company.com prod4

我尝试过这种方法,但是如果不使用聚合函数似乎无法解决

以下是代码段:

abc1  abc@company.com prod1
                      prod2
                      prod3
def1  def@company.com prod2
                      prod3
xyz1  xyz@company.com prod1
                      prod3
                      prod4

这就是我得到的:

df = pd.DataFrame(data['Members'])
dn_group = df.groupby(['username','email'])
new_df  = dn_group['membership'].agg('value_counts')
print(new_df)

基本上,我不想获得带有计数的最后一列。

`

1 个答案:

答案 0 :(得分:0)

该解决方案如何:

import pandas as pd 
df = pd.DataFrame({
    "username": ['abc1','abc1','abc1','def1','def1','xyz1','xyz1','xyz1'],
    "email":['abc@company.com','abc@company.com','abc@company.com','def@company.com','def@company.com','xyz@company.com','xyz@company.com','xyz@company.com'],
    'membership':['prod1','prod2','prod3','prod2','prod3','prod1','prod3','prod4'] })
df.groupby(['username','email'], as_index=False).agg(lambda x: set(x))

结果:

username    email            membership
0   abc1    abc@company.com  {prod2, prod3, prod1}
1   def1    def@company.com  {prod2, prod3}
2   xyz1    xyz@company.com  {prod4, prod3, prod1}