我正在尝试解析设备日志,但是格式不一致
示例:
Roam candidate# 9 F4:CF:E2:5E:73:3F on channel 161 RSSI: -70
Roam candidate#10 F4:CF:E2:62:02:2F on channel 11 RSSI: -70
我要提取Mac地址,频道和RSSI值
不幸的是,候选值变为10或更高后,该空格被省略。
我尝试将其标记化,但是我几乎不了解该过程
def clean(string):
result = ""
for i,char in enumerate(line):
if char == " ":
if string[i+1].isdigit() or string[i+1] == " ":
continue
result += char
return result
def tokenize(string):
result = []
previous = 0
for i,char in enumerate(string):
if char == " ":
result.append(string[previous:i])
previous = i+1
elif i == len(string)-1:
result.append(string[previous:i+1])
return result
我只得到最后一列(RSSI)作为输出
答案 0 :(得分:1)
如果要使用模式,则可以使用3个捕获组,其中1个用于mac地址,1个用于通道以及1个用于RSSI值:
Roam candidate# ?\d+ ((?:[0-9A-Fa-f]{2}[:-]){5}(?:[0-9A-Fa-f]){2}) on channel (\d+) +RSSI: (-?\d+)
在较小的部分:
Roam candidate# ?\d+
匹配漫游候选号,可选空格和1个以上的数字((?:[0-9A-Fa-f]{2}[:-]){5}(?:[0-9A-Fa-f]){2})
捕获组1,匹配mac地址on channel (\d+) +
匹配频道上的 ,空格,然后捕获第2组1个以上的数字RSSI: (-?\d+)
匹配 RSSI:,在第3组中空格并捕获一个可选的-
和一个1+位数字例如
import re
strings = ["Roam candidate# 9 F4:CF:E2:5E:73:3F on channel 161 RSSI: -70", "Roam candidate#10 F4:CF:E2:62:02:2F on channel 11 RSSI: -70"]
regex = r"Roam candidate# ?\d+ ((?:[0-9A-Fa-f]{2}[:-]){5}(?:[0-9A-Fa-f]){2}) on channel (\d+) +RSSI: (-?\d+)"
for s in strings:
print(re.findall(regex, s, re.M))
结果
[('F4:CF:E2:5E:73:3F','161','-70')]
[('F4:CF:E2:62:02:2F','11','-70')]
答案 1 :(得分:1)
简单一些可能会更好
r"(?i)([a-f0-9]{2}(?::[a-f0-9]{2})+)\s.*?\s(\d+)\s.*?\s(-?\d+)"
https://regex101.com/r/smcjY5/1
扩展
(?i)
( # (1 start)
[a-f0-9]{2}
(?: : [a-f0-9]{2} )+
) # (1 end)
\s .*? \s
( \d+ ) # (2)
\s .*? \s
( -? \d+ ) # (3)
答案 2 :(得分:0)
使用正则表达式可以这样工作:
import re
s1="Roam candidate# 9 F4:CF:E2:5E:73:3F on channel 161 RSSI: -70"
s2="Roam candidate#10 F4:CF:E2:62:02:2F on channel 11 RSSI: -70"
patt= re.compile('(?P<mac>[0-9A-F]{2}(:[0-9A-F]{2}){5}).*?channel (?P<channel>[0-9]*).*?RSSI:\s*(?P<rssi>-?[0-9]*)', re.I)
matcher= patt.search(s1)
print(matcher.group('mac'))
print(matcher.group('channel'))
print(matcher.group('rssi'))
这将返回:
F4:CF:E2:5E:73:3F
161
-70
第二行:
F4:CF:E2:62:02:2F
11
-70
答案 3 :(得分:0)
另一种正则表达式方法:
import re
lines = '''Roam candidate# 9 F4:CF:E2:5E:73:3F on channel 161 RSSI: -70
Roam candidate#10 F4:CF:E2:62:02:2F on channel 11 RSSI: -70'''
pat = re.compile(r'(?<=#)\s*\d+\s+((?:[A-F0-9]{2}:){5}[A-F0-9]{2}) .*channel\s+(\d+)\s+RSSI:\s+(-?\d+)', re.I)
for line in lines.split('\n'):
print(pat.findall(line))
输出:
[('F4:CF:E2:5E:73:3F', '161', '-70')]
[('F4:CF:E2:62:02:2F', '11', '-70')]