假设我有一个大型数据集(以CSV formate格式),如下所示:
Country Age Salary Purchased
0 France 44 72000 No
1 Spain 27 48000 Yes
2 Germany 30 54000 No
3 Spain 38 61000 No
4 Germany 40 45000 Yes
5 France 35 58000 Yes
6 Spain 75 52000 No
7 France 48 79000 Yes
8 Germany 50 83000 No
9 France 37 67000 Yes
现在我如何随机交换选定列的所有值?例如 我想随机交换第一列“国家”的所有值。
正在寻找您的建议。预先感谢!
答案 0 :(得分:4)
使用random.shuffle
就地随机播放:
# <= 0.23
# np.random.shuffle(df['Country'].values)
# 0.24+
np.random.shuffle(df['Country'].to_numpy())
或者,用random.choice
分配回来:
df['Country'] = np.random.choice(df['Country'], len(df), replace=False)
答案 1 :(得分:3)
permutation
np.random.seed([3, 1415])
df.assign(Country=df.Country.to_numpy()[np.random.permutation(len(df))])
Country Age Salary Purchased
0 France 44 72000 No
1 Germany 27 48000 Yes
2 France 30 54000 No
3 Spain 38 61000 No
4 France 40 45000 Yes
5 Spain 35 58000 Yes
6 Germany 75 52000 No
7 Spain 48 79000 Yes
8 Germany 50 83000 No
9 France 37 67000 Yes
sample
df.assign(Country=df.Country.sample(frac=1).to_numpy())