Question

我有一个50k行的pandas数据框。我正在尝试添加一个新列，它是从1到5的随机生成的整数。

如果我想要50k随机数，我会使用：

df1['randNumCol'] = random.sample(xrange(50000), len(df1))

但为此我不知道该怎么做。

R中的旁注，我会这样做：

sample(1:5, 50000, replace = TRUE)

有什么建议吗？

Answer 1

一种解决方案是使用np.random.randint：

import numpy as np
df1['randNumCol'] = np.random.randint(1, 6, df1.shape[0])

# or if the numbers are non-consecutive (albeit slower)
df1['randNumCol'] = np.random.choice([1, 9, 20], df1.shape[0])

为了使结果可重现，您可以使用np.random.seed(42)设置种子。

Answer 2

要添加一列随机整数，请使用randint(low, high, size)。没有必要浪费内存分配range(low, high);如果high很大，那可能会占用大量内存。

df1['randNumCol'] = np.random.randint(0,5, size=len(df1))

（另请注意，当我们只添加单个列时，size只是一个整数。通常，如果我们要生成randint()s的数组/数据帧，则大小可以是一个元组，如Pandas: How to create a data frame of random integers?）

Pandas：使用范围内的随机整数在df中创建新列

2 个答案: