我有一个float32值的排序数组,我想将此数组拆分为仅包含如下相同值的列表列表:
>>> split_sorted(array) # [1., 1., 1., 2., 2., 3.]
>>> [[1., 1., 1.], [2., 2.], [3.]]
我当前的方法是此功能
def split_sorted(array):
split = [[array[0]]]
s_index = 0
a_index = 1
while a_index < len(array):
while a_index < len(array) and array[a_index] == split[s_index][0]:
split[s_index].append(array[a_index])
a_index += 1
else:
if a_index < len(array):
s_index += 1
a_index += 1
split.append([array[a_index]])
我现在的问题是,还有更多的Python方式可以做到这一点吗?也许甚至用numpy?这是最高效的方法吗?
非常感谢!
答案 0 :(得分:2)
方法1
以a
作为数组,我们可以使用np.split
-
np.split(a,np.flatnonzero(a[:-1] != a[1:])+1)
样品运行-
In [16]: a
Out[16]: array([1., 1., 1., 2., 2., 3.])
In [17]: np.split(a,np.flatnonzero(a[:-1] != a[1:])+1)
Out[17]: [array([1., 1., 1.]), array([2., 2.]), array([3.])]
方法2
另一种更有效的方法是获取 splitting 索引,然后将数组切成zipping
-
idx = np.flatnonzero(np.r_[True, a[:-1] != a[1:], True])
out = [a[i:j] for i,j in zip(idx[:-1],idx[1:])]
方法3
如果必须获得子列表的列表作为输出,我们可以使用列表重复来重新创建-
mask = np.r_[True, a[:-1] != a[1:], True]
c = np.diff(np.flatnonzero(mask))
out = [[i]*j for i,j in zip(a[mask[:-1]],c)]
具有1000000
个唯一元素的10000
个元素上矢量化方法的时间-
In [145]: np.random.seed(0)
...: a = np.sort(np.random.randint(1,10000,(1000000)))
In [146]: x = a
# Approach #1 from this post
In [147]: %timeit np.split(a,np.flatnonzero(a[:-1] != a[1:])+1)
100 loops, best of 3: 10.5 ms per loop
# Approach #2 from this post
In [148]: %%timeit
...: idx = np.flatnonzero(np.r_[True, a[:-1] != a[1:], True])
...: out = [a[i:j] for i,j in zip(idx[:-1],idx[1:])]
100 loops, best of 3: 5.18 ms per loop
# Approach #3 from this post
In [197]: %%timeit
...: mask = np.r_[True, a[:-1] != a[1:], True]
...: c = np.diff(np.flatnonzero(mask))
...: out = [[i]*j for i,j in zip(a[mask[:-1]],c)]
100 loops, best of 3: 11.1 ms per loop
# @RafaelC's soln
In [149]: %%timeit
...: v,c = np.unique(x, return_counts=True)
...: out = [[a]*b for (a,b) in zip(v,c)]
10 loops, best of 3: 25.6 ms per loop
答案 1 :(得分:1)
您可以使用numpy.unique
和zip
v,c = np.unique(x, return_counts=True)
[[a]*b for (a,b) in zip(v,c)]
输出
[[1.0, 1.0, 1.0], [2.0, 2.0], [3.0]]
600万个数组的时间
%timeit v,c = np.unique(x, return_counts=True); [[a]*b for (a,b) in zip(v,c)]
18.2 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.split(x,np.flatnonzero(x[:-1] != x[1:])+1)
424 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [list(group) for value, group in itertools.groupby(x)]
180 ms ± 4.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
答案 2 :(得分:0)
函数itertools.groupby
具有确切的行为。
>>> from itertools import groupby
>>> [list(group) for value, group in groupby(array)]
[[1.0, 1.0, 1.0], [2.0, 2.0], [3.0]]
答案 3 :(得分:0)
>>> from itertools import groupby
>>> a = [1., 1., 1., 2., 2., 3.]
>>> for k, g in groupby(a) :
... print k, list(g)
...
1.0 [1.0, 1.0, 1.0]
2.0 [2.0, 2.0]
3.0 [3.0]
如果愿意,您可以加入列表
>>> result = []
>>> for k, g in groupby(a) :
... result.append( list(g) )
...
>>> result
[[1.0, 1.0, 1.0], [2.0, 2.0], [3.0]]
答案 4 :(得分:0)
我改进了您的代码,它不是pythonic,但是不使用外部库(而且您的代码在数组的最后一个元素上也不起作用):
def split_sorted(array):
splitted = [[]]
standard = array[0]
li = 0 # inner lists index
n = len(array)
for i in range(n):
if standard != array[i]:
standard = array[i]
splitted.append([]) # appending empty list
li += 1
split[li].append(array[i])
return splitted
# test
array = [1,2,2,2,3]
a = split_sorted(array)
print(a)enter code here