Question

我有一个数组，它给我一些其他数字的累积百分比：

my_cumulative_array = np.asarray(range(0,50))/float(50)

我想将这个数组分成n组，每个组的权重：

chunks    = [[row indexes 01-10], #First 20% based on order
            [row indexes 11-20],  #Second 20% based on order
            [row indexes 21-30],
            [row indexes 31-40],
            [row indexes 41-50]]

似乎应该有一种聪明的方法来有效地做到这一点。

Answer 1

问题没有明确定义，但看起来很有趣。以下将数组（arr）拆分为数组列表（chunks），其中chunks中每个数组的总和大致相等。

splits = 5
arr = np.sin(np.random.rand(100)) + np.arange(100)/50.0
cum_arr = arr.cumsum() / arr.sum()
idx = np.searchsorted(cum_arr, np.linspace(0, 1, splits, endpoint=False)[1:])
chunks = np.split(arr, idx)

我们可以观察到分裂指数不是等间距的：

print idx
[37 59 74 88]

虽然块的总和是：

print [np.sum(x) for x in chunks]
[27.93830, 29.51562, 28.30718, 29.23604, 28.7935]

Answer 2

从您的示例输出中，看起来您正在寻找

chunks = [my_cumulative_array[i*(50/n):(i+1)*(50/n)] for i in range(n)]

根据顺序将数组拆分为相等加权的块

2 个答案: