我需要指出正在解决这个问题的正确方向:
假设我正在读取C程序的输出,如下所示:
while True:
ln = p.stdout.readline()
if '' == ln:
break
#do stuff here with ln
我的输出看起来就像这行一样:
TrnIq: Thread on CPU 37
TrnIq: Thread on CPU 37 but will be moved to CPU 44
IP-Thread on CPU 33
FANOUT Thread on CPU 37
Filter-Thread on CPU 38 but will be moved to CPU 51
TRN TMR Test 2 Supervisor Thread on CPU 34
HomographyWarp Traking Thread[0] on CPU 26
我想将“TrnIq:Thread on”和“37”捕获为2个单独的变量:字符串和输出中的数字“TrnIq:CPU 37上的线程”。
对于其他行来说,它非常相同,例如捕获“HomographyWarp Traking Thread [0] on”和“C 26上的HomographyWarp Traking Thread [0]”中的#“26”。
唯一真正的挑战是这样的行:“CPU 38上的过滤器线程,但将被移动到CPU 51”这行我需要“Filer-Thread”而#“51”不是第一个#“38 ”
Python有很多不同的方法可以做到这一点我甚至不知道从哪里开始!
提前致谢!
答案 0 :(得分:4)
以下内容应该返回一个信息元组,假设ln
是您的数据的一行(编辑为包括将CPU值转换为int):
match = re.match(r'(.*?)(?: on CPU.*)?(?: (?:on|to) CPU )(.*)', ln).groups()
if match:
proc, cpu = match.groups()
cpu = int(cpu)
示例:
>>> import re
>>> for ln in lines:
... print re.match(r'(.*?)(?: on CPU.*)?(?: (?:on|to) CPU )(.*)', ln).groups()
...
('TrnIq: Thread', '37')
('TrnIq: Thread', '44')
('IP-Thread', '33')
('FANOUT Thread', '37')
('Filter-Thread', '51')
('TRN TMR Test 2 Supervisor Thread', '34')
('HomographyWarp Traking Thread[0]', '26')
说明:
(.*?) # capture zero or more characters at the start of the string,
# as few characters as possible
(?: on CPU.*)? # optionally match ' on CPU' followed by any number of characters,
# do not capture this
(?: (?:on|to) CPU ) # match ' on CPU ' or ' to CPU ', but don't capture
(.*) # capture the rest of the line
答案 1 :(得分:2)
s = """TrnIq: Thread on CPU 37
TrnIq: Thread on CPU 37 but will be moved to CPU 44
IP-Thread on CPU 33
FANOUT Thread on CPU 37
Filter-Thread on CPU 38 but will be moved to CPU 51
TRN TMR Test 2 Supervisor Thread on CPU 34
HomographyWarp Traking Thread[0] on CPU 26"""
for line in s.splitlines():
words = line.split()
if not ("CPU" in words and "on" in words): continue # skip uninteresting lines
prefix_words = words[:words.index("on")+1]
prefix = ' '.join(prefix_words)
cpu = int(words[-1])
print (prefix, cpu)
给出
('TrnIq: Thread on', 37)
('TrnIq: Thread on', 44)
('IP-Thread on', 33)
('FANOUT Thread on', 37)
('Filter-Thread on', 51)
('TRN TMR Test 2 Supervisor Thread on', 34)
('HomographyWarp Traking Thread[0] on', 26)
我认为我不需要将这些代码翻译成英文。
答案 2 :(得分:1)
因此请使用正则表达式^(.*?)\s+on\s+CPU.*(?<=\sCPU)\s+(\d+)\s*$
import sys
import re
for ln in sys.stdin:
m = re.match(r'^(.*?)\s+on\s+CPU.*(?<=\sCPU)\s+(\d+)\s*$', ln);
if m is not None:
print m.groups();
查看并测试示例here。
答案 3 :(得分:1)
在你提到的情况下,你总是想要第二个CPU号,所以可以用一个正则表达式完成:
# Test program
import re
lns = [
"TrnIq: Thread on CPU 37",
"TrnIq: Thread on CPU 37 but will be moved to CPU 44",
"IP-Thread on CPU 33",
"FANOUT Thread on CPU 37",
"Filter-Thread on CPU 38 but will be moved to CPU 51",
"TRN TMR Test 2 Supervisor Thread on CPU 34",
"HomographyWarp Traking Thread[0] on CPU 26"
]
for ln in lns:
test = re.search("(?P<process>.*Thread\S* on).* CPU (?P<cpu>\d+)$", ln)
print "%s: '%s' on CPU #%s" % ( ln, test.group('process'), test.group('cpu'))
在一般情况下,您可能希望区分案例(例如,CPU上的线程,移动的线程,子线程......)。为此,您可以一个接一个地使用几个re.search()。例如:
# This search recognizes lines of the form "...Thread on CPU so-and-so", and
# also lines that add "...but will be moved to CPU some-other-cpu".
test = re.search("(?P<process>.* Thread) on CPU (?P<cpu1>\d+)( but will be moved to CPU (?P<cpu2>\d+))*", ln)
if test:
# Here we capture Process Thread, both moved and non moved
if test.group('cpu2'):
# We have process, cpu1 and cpu2: moved thread
else:
# Nonmoved task, we have test.group('process') and cpu1.
else:
# No match, try some other regexp. For example processes with a thread number
# between square brackets: "Thread[0]", which are not captured by the regex above.
test = re.search("(?P<process>.*) Thread[(?P<thread>\d+)] on CPU (?P<cpu1>)", ln)
if test:
# Here we have Homography Traking in process, 0 in thread, 26 in cpu1
为获得最佳性能,最好先对频繁出现的线路进行测试。
答案 4 :(得分:1)
通过两次正则表达式搜索可以非常简单地完成:
import re
while True:
ln = p.stdout.readline()
if '' == ln:
break
start_match = re.search(r'^(.*?) on', ln)
end_match = re.search(r'(\d+)$', ln)
process = start_match and start_match.group(0)
process_number = end_match and end_match.group(0)