Question

我有12行x 5列的数据帧（df）。我从每个标签中抽取1行，并创建一个3行×5列的新数据帧（df1）。我需要在下次从df中采样更多行时，我不会选择已经存在于df1中的相同行。那么如何从df中删除已经采样的行？

import pandas as pd
import numpy as np

# 12x5
df = pd.DataFrame(np.random.rand(12, 5))
label=np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
df['label'] = label


#3x5
df1 = pd.concat(g.sample(1) for idx, g in df.groupby('label'))


#My attempt. It should be a 9x5 dataframe
df2 = pd.concat(f.drop(idx) for idx, f in df1.groupby('label'))

DF

DF1

DF2

Answer 1

从这个DataFrame开始：

df = pd.DataFrame(np.random.rand(12, 5))
label=np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
df['label'] = label

您的第一个样本是：

df1 = pd.concat(g.sample(1) for idx, g in df.groupby('label'))

对于第二个样本，您可以从df：

中删除df1的索引

pd.concat(g.sample(1) for idx, g in df.drop(df1.index).groupby('label'))
Out: 
          0         1         2         3         4  label
2  0.188005  0.765640  0.549734  0.712261  0.334071      1
4  0.599812  0.713593  0.366226  0.374616  0.952237      2
8  0.631922  0.585104  0.184801  0.147213  0.804537      3

这不是一个现场操作。它不会修改原始DataFrame。它只是删除行，返回副本，并从该副本中提取样本。如果您希望它是永久性的，您可以这样做：

df2 = df.drop(df1.index)

之后来自df2的样本。

如何删除数据帧的随机采样行，以避免再次采样？

1 个答案: