我有像
这样的numpy float数组v = np.array([1.0,1.0,2.0,2.0,2.0,2.0,...])
我需要识别数组中的所有常量段,如
[{value:1.0,location:0,duration:2},..]
效率是主要指标
答案 0 :(得分:3)
这是一种方法 -
def island_props(v):
# Get one-off shifted slices and then compare element-wise, to give
# us a mask of start and start positions for each island.
# Also, get the corresponding indices.
mask = np.concatenate(( [True], v[1:] != v[:-1], [True] ))
loc0 = np.flatnonzero(mask)
# Get the start locations
loc = loc0[:-1]
# The values would be input array indexe by the start locations.
# The lengths woul be the differentiation between start and stop indices.
return v[loc], loc, np.diff(loc0)
示例运行 -
In [143]: v
Out[143]: array([ 1., 1., 2., 2., 2., 2., 5., 2.])
In [144]: value, location, lengths = island_props(v)
In [145]: value
Out[145]: array([ 1., 2., 5., 2.])
In [146]: location
Out[146]: array([0, 2, 6, 7])
In [147]: lengths
Out[147]: array([2, 4, 1, 1])
运行时测试
其他方法 -
import itertools
def MSeifert(a):
return [{'value': k, 'duration': len(list(v))} for k, v in
itertools.groupby(a.tolist())]
def Kasramvd(a):
return np.split(v, np.where(np.diff(v) != 0)[0] + 1)
计时 -
In [156]: v0 = np.array([1.0,1.0,2.0,2.0,2.0,2.0,5.0,2.0])
In [157]: v = np.tile(v0,10000)
In [158]: %timeit MSeifert(v)
...: %timeit Kasramvd(v)
...: %timeit island_props(v)
...:
10 loops, best of 3: 44.7 ms per loop
10 loops, best of 3: 36.1 ms per loop
10000 loops, best of 3: 140 µs per loop
答案 1 :(得分:2)
您可以按如下方式对相同的项目进行分组,然后通过获取数组的大小,第一个元素和索引来完成剩下的工作:
In [2]: v = np.array([1.0,1.0,2.0,2.0,2.0,2.0,3.0, 3.0, 5.0, 6.0, 6.0])
In [4]: np.split(v, np.where(np.diff(v) != 0)[0] + 1)
Out[4]:
[array([ 1., 1.]),
array([ 2., 2., 2., 2.]),
array([ 3., 3.]),
array([ 5.]),
array([ 6., 6., 6.])]
等式np.diff(v) != 0
表示序列变化的位置(差值不为0),np.where()
给出了这些位置的相应索引(来自布尔结果)。然后,您可以使用np.split()
简单地拆分数组。
最后,你可以使用列表理解来获得所需的结果:
In [7]: locations = np.where(np.diff(v) != 0)[0] + 1
In [8]: result = np.split(v, locations)
In [9]: [{'value':arr[0], 'location':loc, 'duration':arr.size} for loc, arr in zip(locations, result)]
Out[9]:
[{'duration': 2, 'value': 1.0, 'location': 2},
{'duration': 4, 'value': 2.0, 'location': 6},
{'duration': 2, 'value': 3.0, 'location': 8},
{'duration': 1, 'value': 5.0, 'location': 9}]
答案 2 :(得分:2)
你可以使用itertools.groupby
,它可能会慢一些(没有定时)但可能更容易理解:
>>> import numpy as np
>>> import itertools
>>> a = np.array([1.0,1.0,2.0,2.0,2.0,2.0])
>>> [{'value': k, 'duration': len(list(v))} for k, v in itertools.groupby(a.tolist())]
[{'duration': 2, 'value': 1.0}, {'duration': 4, 'value': 2.0}]