Question

我有一个包含4列的数据框，对于每一列，我们都必须进行存储（将数据分布在8个存储桶中），这样就应该对第一列和第二列进行存储，以此类推，而无需指定该列手动命名

这是我正在尝试的代码

for col in df3.columns[0:]:
cb1 = np.linspace(min(col), max(col), 11)
df3.insert(2 ,'buckets',pd.cut(col, cb1, labels=np.arange(1, 11, 1)))
print(df3[col])

这里df3是样本数据集

苹果橙香蕉

5 2 6

6 4 6

2 8 9

4 7 0

预期输出是

苹果橙香蕉水桶_苹果水桶_橙色水桶_香蕉

5 2 6 1 3 2

6 4 6 1 1 4

2 8 9 2 1 8

4 7 0 5 4 1

此处存储桶列用于指定数据的存储桶编号

Answer 1

由于输出完全是随机的，因此数据列与存储区编号之间没有关联，因此在这种情况下，您应该分别生成存储区。

for c in df.columns:
    df['bucket_' + c] = np.random.randint(8, size=(len(df))) + 1
df # your random bucket df.

如果您希望铲斗尺寸相等：

for c in df.columns:
    arr = np.arange(8) + 1
    arr = np.repeat(arr, int(len(df))/8) # your df has to be divisible by 8
    np.random.shuffle(arr) # shuffle the array.
    df['bucket_' + c] = arr

我必须存储数据帧中的每一列（8位数）

1 个答案: