Question

我想基于groupby随机混洗数据帧的单个列的值。例如，我有两列A和B.现在，我想根据A上的groupby随机地移动B列。

例如，假设A中有三个不同的值。现在，对于A的每个不同值，我想要将B中的值混洗，但只是具有相同A的值。

示例输入：

A       B     
------------
1       1          
1       3    
2       4     
3       6   
1       2  
3       5

示例输出：

A       B        
------------
1       3          
1       2    
2       4     
3       6   
1       1  
3       5

在这种情况下，对于A=1，B的值被洗牌。同样的情况发生在A=2，但由于只有一行，它就像它一样。对于A=3，偶然的B值也保持不变。

我想用熊猫来实现它。

Answer 1

为此，您可以将np.random.permutation（返回数组的混洗版本）与groupby和transform（返回该组的类似索引版本）组合在一起。例如：

>>> df
   col1  col2
0     1     1
1     1     3
2     2     4
3     3     6
4     1     2
5     3     5
>>> df["col3"] = df.groupby("col1")["col2"].transform(np.random.permutation)
>>> df
   col1  col2  col3
0     1     1     2
1     1     3     1
2     2     4     4
3     3     6     5
4     1     2     3
5     3     5     6

请注意，这些值仅在其col1组中进行洗牌。

Answer 2

您还可以将groupby与sample一起使用：

df = pd.DataFrame({'col1': [1, 1, 2, 3, 1, 3], 
                   'col2': [1, 3, 4, 6, 2, 5]})

df_rand = df.groupby('col1').apply(lambda x: x.sample(frac=1)).reset_index(drop=True)

>>> df.sort('col1')
   col1  col2
0     1     1
1     1     3
4     1     2
2     2     4
3     3     6
5     3     5

>>> df_rand
   col1  col2
0     1     2
1     1     3
2     1     1
3     2     4
4     3     6
5     3     5

用groupby在熊猫数据框中随机播放列

2 个答案: