我有一行点数:
a = np.array([18, 56, 32, 75, 55, 55])
我有另一个数组,它对应于我想用来访问a中的信息的索引(它们总是具有相同的长度)。数组a
和数组b
都没有排序。
b = np.array([0, 2, 3, 2, 2, 2])
我想将a
分组到多个子数组中,以便可以实现以下目标:
c[0] -> array([18])
c[2] -> array([56, 75, 55, 55])
c[3] -> array([32])
虽然上面的例子很简单,但我将处理数百万个点,因此首选有效的方法。稍后还必须通过自动化方法在程序中稍后以这种方式访问任何子点阵列。
答案 0 :(得分:4)
这是一种方法 -
def groupby(a, b):
# Get argsort indices, to be used to sort a and b in the next steps
sidx = b.argsort(kind='mergesort')
a_sorted = a[sidx]
b_sorted = b[sidx]
# Get the group limit indices (start, stop of groups)
cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])
# Split input array with those start, stop ones
out = [a_sorted[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
return out
更简单但效率更低的方法是使用np.split
替换最后几行并获取输出,如下所示 -
out = np.split(a_sorted, np.flatnonzero(b_sorted[1:] != b_sorted[:-1])+1 )
示例运行 -
In [38]: a
Out[38]: array([18, 56, 32, 75, 55, 55])
In [39]: b
Out[39]: array([0, 2, 3, 2, 2, 2])
In [40]: groupby(a, b)
Out[40]: [array([18]), array([56, 75, 55, 55]), array([32])]
获取涵盖b
-
def groupby_perID(a, b):
# Get argsort indices, to be used to sort a and b in the next steps
sidx = b.argsort(kind='mergesort')
a_sorted = a[sidx]
b_sorted = b[sidx]
# Get the group limit indices (start, stop of groups)
cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])
# Create cut indices for all unique IDs in b
n = b_sorted[-1]+2
cut_idxe = np.full(n, cut_idx[-1], dtype=int)
insert_idx = b_sorted[cut_idx[:-1]]
cut_idxe[insert_idx] = cut_idx[:-1]
cut_idxe = np.minimum.accumulate(cut_idxe[::-1])[::-1]
# Split input array with those start, stop ones
out = [a_sorted[i:j] for i,j in zip(cut_idxe[:-1],cut_idxe[1:])]
return out
示例运行 -
In [241]: a
Out[241]: array([18, 56, 32, 75, 55, 55])
In [242]: b
Out[242]: array([0, 2, 3, 2, 2, 2])
In [243]: groupby_perID(a, b)
Out[243]: [array([18]), array([], dtype=int64),
array([56, 75, 55, 55]), array([32])]