Question

我正在处理有时会产生噪音尖峰的射频信号输入是这样的：
00000001111100011110001111100001110000001000001111000000111001111000

在解析信号中的数据之前，我需要删除尖峰位，即0和1的序列，其长度低于（在本例中）3。

所以基本上我需要匹配0000000111110001111000111110000111000000(1)000001111000000111(00)1111000
匹配后，我将它替换为之前的位，所以一个干净的信号看起来像这样： 00000001111100011110001111100001110000000000001111000000111111111000

到目前为止，我用两种不同的Regex实现了这个目标：

self.re_one_spikes = re.compile("(?:[^1])(?P<spike>1{1,%d})(?=[^1])" % (self._SHORTEST_BIT_LEN - 1))
self.re_zero_spikes = re.compile("(?:[^0])(?P<spike>0{1,%d})(?=[^0])" % (self._SHORTEST_BIT_LEN - 1))

然后我迭代匹配并替换。

如何使用单个正则表达式执行此操作？我可以使用正则表达式来替换不同大小的匹配吗？我试过这样的事情没有成功：

re.compile("(?![\1])([01]{1,2})(?![\1])")

Answer 1

import re
THRESHOLD=3

def fixer(match):
    ones = match.group(0)
    if len(ones) < THRESHOLD: return "0"*len(ones)
    return ones

my_string = '00000001111100011110001111100001110000001000001111000000111001111000'
print(re.sub("(1+)",fixer,my_string))

如果你想删除＆＃34; spikes＆＃34;零的

def fixer(match):
    items = match.group(0)
    if len(items) < THRESHOLD: return "10"[int(items[0])]*len(items)
    return items

print(re.sub("(1+)|(0+)",fixer,my_string))

Answer 2

要在单个正则表达式中匹配两个案例[01]，只需这样：

(?<=([01]))(?:(?!\1)[01]){1,2}(?=\1)

扩展

 (?<=                 # Lookbehind for 0 or 1
      ( [01] )             # (1), Capture behind 0 or 1
 )
 (?:                  # Match spike, one to %d times in length
      (?! \1 )             # Cannot be the 0 or 1 from lookbehind
      [01] 
 ){1,2}
 (?= \1 )             # Lookahead, can only be 0 or 1 from capture (1)

替换为$1次匹配的长度（即组0的长度）。

匹配

 **  Grp 0 -  ( pos 40 , len 1 ) 
1  
 **  Grp 1 -  ( pos 39 , len 1 ) 
0  

----------------------------------------

 **  Grp 0 -  ( pos 59 , len 2 ) 
00  
 **  Grp 1 -  ( pos 58 , len 1 ) 
1

基准

Regex1:   (?<=([01]))(?:(?!\1)[01]){1,2}(?=\1)
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   2
Elapsed Time:    2.06 s,   2058.02 ms,   2058018 µs


50,000 iterations * 2 matches/iteration = 100,000 matches 

100,000 matches / 2 sec's  =  50,000 matches per second

Answer 3

替代方法，不使用regex，而是使用replace()代替（如果有人可能会在将来发现它有用）：

>>> my_signal = '00000001111100011110001111100001110000001000001111000000111001111000'
>>> my_threshold = 3
>>> for i in range(my_threshold):
...     my_signal = my_signal.replace('0{}0'.format('1'*(i+1)), '0{}0'.format('0'*(i+1)))
... 
>>> my_signal
'00000001111100011110001111100000000000000000001111000000000001111000'

Answer 4

def fix_noise(s, noise_thold=3):
    pattern=re.compile(r'(?P<before>1|0)(?P<noise>(?<=0)1{1,%d}(?=0)|(?<=1)0{1,%d}(?=1))' % (noise_thold-1, noise_thold-1))
    result = s
    for noise_match in pattern.finditer(s):
        beginning = result[:noise_match.start()+1]
        end = result[noise_match.end():]
        replaced = noise_match.group('before')*len(noise_match.group('noise'))
        result = beginning + replaced + end
    return result

乔丹的int(items[0])索引构思非常棒！

正则表达式去除位信号噪声尖峰

4 个答案: