pandas dataframe group by with permutations

时间:2015-10-16 02:47:26

标签: python pandas

我有这个数据框

df  = pd.DataFrame({'alpha': ['ab', 'ab', 'ab', 'cd','cd','cd'],
   'beta': ['12', '34','56','78','90','22'],})
df

对于名为' alpha'的列中的每个组。我想生成一个名为' gamma'的新列。列' beta'和伽玛'表示两列的所有排列。

df1 = pd.DataFrame({'alpha': ['ab', 'ab', 'ab', 'ab', 'ab', 'ab','cd','cd','cd','cd','cd','cd'],
   'beta': ['12', '34','56','12', '56','34' , '78','90','22','22','78','90' ],
   'gamma': ['34', '12','12','56', '34','56' , '90','78','78','90','22','22' ]})
df1

我试过以下

from itertools import permutations, product
df['gamma']= df['beta']
dfg = df.groupby('alpha')
perms = {}
for a, v in dfg:
    perms[a] =  list(permutations(v.values))

print(perms)
pd.DataFrame(perms)

2 个答案:

答案 0 :(得分:1)

根据您的要求,您的代码实际上错误地使用了permutations。您需要仅基于beta列进行置换,并使itertools.permutations一次取2个元素。示例 -

from itertools import permutations
grouped = df.groupby('alpha')
resultlist = []

for key,group in grouped:
    for b,g in permutations(group['beta'].tolist(),2):
        resultlist.append([key,b,g])

result = pd.DataFrame(resultlist,columns=['alpha','beta','gamma'])

演示 -

In [29]: df
Out[29]:
  alpha beta
0    ab   12
1    ab   34
2    ab   56
3    cd   78
4    cd   90
5    cd   22

In [30]: grouped = df.groupby('alpha')

In [31]: resultlist = []

In [32]: for key,group in grouped:
   ....:     for b,g in itertools.permutations(group['beta'].tolist(),2):
   ....:         resultlist.append([key,b,g])
   ....:

In [33]: result = pd.DataFrame(resultlist,columns=['alpha','beta','gamma'])

In [34]: result
Out[34]:
   alpha beta gamma
0     ab   12    34
1     ab   12    56
2     ab   34    12
3     ab   34    56
4     ab   56    12
5     ab   56    34
6     cd   78    90
7     cd   78    22
8     cd   90    78
9     cd   90    22
10    cd   22    78
11    cd   22    90

答案 1 :(得分:1)

您可以使用apply

来避免循环
In [192]: (df.groupby('alpha')
             .apply(lambda x: pd.DataFrame(list(permutations(x['beta'], 2))))
             .reset_index())
Out[192]:
   alpha  level_1   0   1
0     ab        0  12  34
1     ab        1  12  56
2     ab        2  34  12
3     ab        3  34  56
4     ab        4  56  12
5     ab        5  56  34
6     cd        0  78  90
7     cd        1  78  22
8     cd        2  90  78
9     cd        3  90  22
10    cd        4  22  78
11    cd        5  22  90

In [193]: dff = (df
                .groupby('alpha')
                .apply(lambda x: pd.DataFrame(list(permutations(x['beta'], 2))))
                .reset_index())

In [194]: dff = dff[['alpha', 0, 1]]

In [195]: dff.columns = ['alpha', 'beta', 'gamma']

In [196]: dff
Out[196]:
   alpha beta gamma
0     ab   12    34
1     ab   12    56
2     ab   34    12
3     ab   34    56
4     ab   56    12
5     ab   56    34
6     cd   78    90
7     cd   78    22
8     cd   90    78
9     cd   90    22
10    cd   22    78
11    cd   22    90