Question

我对熊猫（以及python ......和编程）相对较新，我试图进行蒙特卡罗模拟，但我无法找到合理数量的解决方案。时间

数据存储在名为＆＃34; YTDSales＆＃34;的数据框中。每个产品每天销售

Date          Product_A     Product_B     Product_C     Product_D     ...   Product_XX
01/01/2014         1000           300            70         34500     ...          780   
02/01/2014          400           400            70            20     ...           10   
03/01/2014         1110           400          1170            60     ...           50   
04/01/2014           20           320             0         71300     ...           10   
       ...
15/10/2014         1000           300            70         34500     ...         5000

我想要做的是模拟不同的场景，在一年中的其他时间（从10月15日到年底）使用每个产品的历史分布。例如，根据提供的数据，我希望在今年剩余时间内完成20到1100之间的销售。

我所做的是以下

# creates range of "future dates"
last_historical = YTDSales.index.max()
year_end = dt.datetime(2014,12,30)
DatesEOY = pd.date_range(start=last_historical,end=year_end).shift(1)

# function that obtains a random sales number per product, between max and min
f = lambda x:np.random.randint(x.min(),x.max())

# create all the "future" dates and fill it with the output of f
for i in DatesEOY:
    YTDSales.loc[i]=YTDSales.apply(f)

解决方案有效，但需要大约3秒钟，如果我计划进行1,000次迭代，那就很多了......有没有办法不迭代？

由于

Answer 1

使用size的{{1}}选项一次性获取所需大小的样本。我将考虑的一种方法简述如下。

将您需要的空间分配到一个新数组中，该数组将包含DatesEOY的索引值，原始DataFrame中的列以及所有NaN值。然后连接到原始数据。
现在您已知道所需的每个随机样本的长度，请使用np.random.randint中的额外size关键字对每列进行一次采样，而不是循环。< / p>
使用此批量抽样覆盖数据。

这可能是这样的：

numpy.random.randint

在此过程中，我选择创建一个全新的DataFrame，通过将旧的DataFrame与新的“占位符”连接起来。对于非常大的数据，这显然是低效的。

另一种方法是setting with enlargement，就像你在for-loop解决方案中所做的那样。

我没有那么长时间地使用这种方法来弄清楚如何一次性“扩大”批量索引。但是，如果你想出来，你可以只用所有NaN值（来自new_df = pandas.DataFrame(index=DatesEOY, columns=YTDSales.columns) num_to_sample = len(new_df) f = lambda x: np.random.randint(x[1].min(), x[1].max(), num_to_sample) output = pandas.concat([YTDSales, new_df], axis=0) output[len(YTDSales):] = np.asarray(map(f, YTDSales.iteritems())).T的索引值）“放大”原始数据框，然后将函数应用于DatesEOY而不是{ {1}}进入它。

随机采样与熊猫数据帧

1 个答案: