我有一系列数字(一维数组),例如0,0,1,1,1,0,1,1,1,1,...
是否有一种优雅的方法(最好是最快的方法)在发生变化之前对1或0的连续出现次数进行计数?因此,结果将是(0,2),(1、3),(0、1),(1、4),...
答案 0 :(得分:2)
这里是NumPy的另一个,特别是利用数组切片-
def islands_info(a):
# Compare consecutive elems for changes. Use `True` as sentients to detect edges
idx = np.flatnonzero(np.r_[True,a[:-1]!=a[1:],True])
# Index into input array with the sliced array until second last array to
# get start indices and the differentiation for the lengths
return np.column_stack((a[idx[:-1]],np.diff(idx)))
样品运行-
In [51]: a = np.array([0, 0, 1, 1, 1, 0, 1, 1, 1, 1])
In [52]: islands_info(a)
Out[52]:
array([[0, 2],
[1, 3],
[0, 1],
[1, 4]])
如果您需要将输出作为元组列表-
In [56]: list(zip(*islands_info(a).T))
Out[56]: [(0, 2), (1, 3), (0, 1), (1, 4)]
时间-
与其他基于NumPy的对象相比,@yatu
-
In [43]: np.random.seed(a)
In [44]: a = np.random.choice([0,1], 1000000)
In [45]: %timeit yatu(a)
11.7 ms ± 428 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [46]: %timeit islands_info(a)
8.98 ms ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [47]: np.random.seed(a)
In [48]: a = np.random.choice([0,1], 10000000)
In [49]: %timeit yatu(a)
232 ms ± 3.71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [50]: %timeit islands_info(a)
152 ms ± 933 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
答案 1 :(得分:1)
您可以使用groupby
中的itertools
from itertools import groupby
x = [1, 0, 0, 1, 0, 1, 1, 0, 0, 0]
occ = [(i, len([*y,])) for i,y in groupby(x)]
输出:
In [23]: [(i, len([*y,])) for i,y in groupby(x)]
Out[23]: [(1, 1), (0, 2), (1, 1), (0, 1), (1, 2), (0, 3)]
答案 2 :(得分:1)
这是一个NumPy
,表现出色:
a = np.array([0, 0, 1, 1, 1, 0, 1, 1, 1, 1])
# indexes where changes take place
changes = np.flatnonzero(np.diff(a)!=0)
#include initial and end index
ix = np.r_[0,changes+1,a.shape[0]]
# index the array with changes to check the value in question
# stack with the count of values, taking the diff over ix
np.column_stack([np.r_[a[changes], a[a.shape[0]-1]], np.diff(ix)])
array([[0, 2],
[1, 3],
[0, 1],
[1, 4]], dtype=int64)
时间:
def yatu(a):
changes = np.flatnonzero(np.diff(a)!=0)
ix = np.r_[0,changes+1,a.shape[0]]
return np.column_stack([np.r_[a[changes], a[a.shape[0]-1]], np.diff(ix)])
def groupby(a):
return [(i, len([*y,])) for i,y in groupby(a)]
a = np.random.choice([0,1], 10_000)
%timeit groupby(list(a))
# 1.83 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit yatu(a)
# 150 µs ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
答案 3 :(得分:0)
我使用了reduce
中的functools
,Ayoub展示的groupby
可能更好(更快),因为这一次每次都会复制累加器。
from functools import reduce
l = [0, 0, 1, 1, 1, 0, 1, 1, 1, 1]
p = lambda acc, x : acc[:-1] + [(x, acc[-1][1] + 1)] if acc and x == acc[-1][0] else acc + [(x, 1)]
result = reduce(p, l, [])
print(result)
[(0,2),(1,3),(0,1),(1,4)]