Question

我有一个熊猫数据框df，它显示如下：

Month   Day mnthShape
1      1    1.016754224
1      1    1.099451003
1      1    0.963911929
1      2    1.016754224
1      1    1.099451003
1      2    0.963911929
1      3    1.016754224
1      3    1.099451003
1      3    1.783775568

我想从df中获取以下信息：

Month   Day mnthShape
1       1   1.016754224
1       2   1.016754224
1       3   1.099451003

其中从索引中随机选择mnthShape值。也就是说，如果查询为df.loc [（1，1）]，则应查找（1，1）的所有值，然后从中随机选择一个要在上方显示的值。

Answer 1

一种方法是从每个组中Series.sample()随机行：

pd.np.random.seed(1)

res = df.groupby(['Month', 'Day'])['mnthShape'].apply(lambda x: x.sample()).reset_index(level=[0, 1])

res
   Month  Day  mnthShape
0      1    1   1.099451
1      1    2   1.016754
2      1    3   1.016754

Answer 2

将groupby与apply一起使用，可以按组随机选择一行。

np.random.seed(0)
df.groupby(['Month', 'Day'])['mnthShape'].apply(np.random.choice).reset_index()

   Month  Day  mnthShape
0      1    1   1.016754
1      1    2   0.963912
2      1    3   1.099451

如果您想知道采样行来自哪个索引，请将pd.Series.sample与n=1结合使用：

np.random.seed(0)
(df.groupby(['Month', 'Day'])['mnthShape']
   .apply(pd.Series.sample, n=1)
   .reset_index(level=[0, 1]))

   Month  Day  mnthShape
2      1    1   0.963912
3      1    2   1.016754
6      1    3   1.016754

从数据框中选择随机值

2 个答案: