如何在python中搜索具有容差的模式

时间:2014-02-24 11:28:05

标签: python extract sequence sample missing-data

这是我的问题。我试图收集实时系统的输出,该系统将其输出从55减少到0,步长为1.我正在记录这些数据。我已经捕获了它返回的所有数据55,54,53,.... 3,2,1,。但由于系统滞后,一些样本会重复出现,而一些样本会被遗漏 例如,我得到

  

[55,53,52,52,51,50,49,48,47,46,45,44,43,42,41,39,38,38,36,36,34,33,33, 32,31,30,29,28,27,26,25,24,23,22,21,20,20,18,17,15,14,13,13,12,11,10,9,8, 7,7,5,4,3,2,1]

所以,我有一个模式(55到0),但错过了一些样本,并重复了一些样本。 有没有办法我可以编写脚本并提取它们。

我的目标是验证55到0的步骤是1(但这应该考虑到因采样而导致的任何单个未命中和重复)以下是代码:

for x in range(len(b)-1): 
   e += b[x] - b[x+1]
   print x,b[x]-b[x+1], b[x], b[x+1] 
   print 'reached %d count in %d decrements' % (e, len(b)-1)

2 个答案:

答案 0 :(得分:3)

如果我理解得很好,你可能想要使用numpy diff函数:

In [1]: import numpy as np 

In [2]: A = [55, 53, 52, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 39, 38, 38, 36, 36, 34, 33, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 20, 18, 17, 15, 14, 13, 13, 12, 11, 10, 9, 8, 7, 7, 5, 4, 3, 2, 1]


In [3]: np.diff(A)
Out[3]: 
array([-2, -1,  0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -2, -1,  0,
       -2,  0, -2, -1,  0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1,  0, -2, -1, -2, -1, -1,  0, -1, -1, -1, -1, -1, -1,  0, -2, -1,
       -1, -1, -1])

其中-1是预期的步骤1,-2是缺失步骤,0是重复。 如果你想知道问题出在哪里:

In [4]: np.where(np.diff(A) != -1)[0] # [0] because it's 1D array
Out[4]: array([ 0,  2, 14, 16, 17, 18, 19, 21, 35, 36, 38, 41, 48, 49])

告诉我是否不清楚。

答案 1 :(得分:1)

或者,如果您想要找到所有样本的集合,那些未找到的样本集以及重复样本集:

b = [55, 53, 52, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 39, 38, 38, 36, 36, 34, 33, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 20, 18, 17, 15, 14, 13, 13, 12, 11, 10, 9, 8, 7, 7, 5, 4, 3, 2, 1]

unique_b = set(b)
not_in_b = [x for x in xrange(1, 56) if x not in b]
repeats_in_b = [x for x in xrange(1,56) if b.count(x) > 1]

print unique_b
print not_in_b
print repeats_in_b

>>>set([1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 36, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55])
>>>[6, 16, 19, 35, 37, 40, 54]
>>>[7, 13, 20, 33, 36, 38, 52]