我正在处理有时会产生噪音尖峰的射频信号
输入是这样的:
00000001111100011110001111100001110000001000001111000000111001111000
在解析信号中的数据之前,我需要删除尖峰位,即0和1的序列,其长度低于(在本例中)3。
所以基本上我需要匹配0000000111110001111000111110000111000000(1)000001111000000111(00)1111000
匹配后,我将它替换为之前的位,所以一个干净的信号看起来像这样:
00000001111100011110001111100001110000000000001111000000111111111000
到目前为止,我用两种不同的Regex实现了这个目标:
self.re_one_spikes = re.compile("(?:[^1])(?P<spike>1{1,%d})(?=[^1])" % (self._SHORTEST_BIT_LEN - 1))
self.re_zero_spikes = re.compile("(?:[^0])(?P<spike>0{1,%d})(?=[^0])" % (self._SHORTEST_BIT_LEN - 1))
然后我迭代匹配并替换。
如何使用单个正则表达式执行此操作?我可以使用正则表达式来替换不同大小的匹配吗? 我试过这样的事情没有成功:
re.compile("(?![\1])([01]{1,2})(?![\1])")
答案 0 :(得分:6)
import re
THRESHOLD=3
def fixer(match):
ones = match.group(0)
if len(ones) < THRESHOLD: return "0"*len(ones)
return ones
my_string = '00000001111100011110001111100001110000001000001111000000111001111000'
print(re.sub("(1+)",fixer,my_string))
如果你想删除&#34; spikes&#34;零的
def fixer(match):
items = match.group(0)
if len(items) < THRESHOLD: return "10"[int(items[0])]*len(items)
return items
print(re.sub("(1+)|(0+)",fixer,my_string))
答案 1 :(得分:1)
要在单个正则表达式中匹配两个案例[01]
,只需这样:
(?<=([01]))(?:(?!\1)[01]){1,2}(?=\1)
扩展
(?<= # Lookbehind for 0 or 1
( [01] ) # (1), Capture behind 0 or 1
)
(?: # Match spike, one to %d times in length
(?! \1 ) # Cannot be the 0 or 1 from lookbehind
[01]
){1,2}
(?= \1 ) # Lookahead, can only be 0 or 1 from capture (1)
替换为$1
次匹配的长度(即组0的长度)。
匹配
** Grp 0 - ( pos 40 , len 1 )
1
** Grp 1 - ( pos 39 , len 1 )
0
----------------------------------------
** Grp 0 - ( pos 59 , len 2 )
00
** Grp 1 - ( pos 58 , len 1 )
1
基准
Regex1: (?<=([01]))(?:(?!\1)[01]){1,2}(?=\1)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 2
Elapsed Time: 2.06 s, 2058.02 ms, 2058018 µs
50,000 iterations * 2 matches/iteration = 100,000 matches
100,000 matches / 2 sec's = 50,000 matches per second
答案 2 :(得分:0)
替代方法,不使用regex
,而是使用replace()
代替(如果有人可能会在将来发现它有用):
>>> my_signal = '00000001111100011110001111100001110000001000001111000000111001111000'
>>> my_threshold = 3
>>> for i in range(my_threshold):
... my_signal = my_signal.replace('0{}0'.format('1'*(i+1)), '0{}0'.format('0'*(i+1)))
...
>>> my_signal
'00000001111100011110001111100000000000000000001111000000000001111000'
答案 3 :(得分:0)
def fix_noise(s, noise_thold=3):
pattern=re.compile(r'(?P<before>1|0)(?P<noise>(?<=0)1{1,%d}(?=0)|(?<=1)0{1,%d}(?=1))' % (noise_thold-1, noise_thold-1))
result = s
for noise_match in pattern.finditer(s):
beginning = result[:noise_match.start()+1]
end = result[noise_match.end():]
replaced = noise_match.group('before')*len(noise_match.group('noise'))
result = beginning + replaced + end
return result
乔丹的int(items[0])
索引构思非常棒!