Python:使用Regex创建一组非重复条目

时间:2016-05-25 13:53:35

标签: python arrays regex

我正在编写一个函数来查找系统上发生的进程的名称。我接受这样的数组:

['\\\\TEST-PC\\Process(python)\\Operations/sec',
'\\\\TEST-PC\\Process(process#2)\\Operations/sec', 
'\\\\TEST-PC\\Process(process#1)\\Operations/sec', 
'\\\\TEST-PC\\Process(process)\\Operations/sec', 
'\\\\TEST-PC\\Process(python)\\Thread Count', 
'\\\\TEST-PC\\Process(process#2)\\Thread Count',
'\\\\TEST-PC\\Process(process#1)\\Thread Count',
'\\\\TEST-PC\\Process(process)\\Thread Count'....etc....]

我想在这样的数组中输出每个进程的名称:

['python','process#2','process#1','process']

(请注意,如果一个进程在原始数组中出现多次,我不希望在输出数组中出现重复)

这是我到目前为止所做的:

def count_no_of_processes(row_to_check):
    #Ignore first entry
    to_search= row_to_check[1:]
    processes=[]
    for number in range(0,len(header_to_search)):
        search = re.search(r"\(([^)]+)\)", header_to_search[number])
        processes.append(search
    print processes

但是这并没有在"<_sre.SRE_Match object at 0x10c1fw321>"列表中显示"processes"所列的流程列表。

我做错了什么?

我还没有进入舞台或检查processes列表中的重复内容,但如果有任何建议,我将不胜感激,因为我不熟悉使用正则表达式。

3 个答案:

答案 0 :(得分:1)

提醒re.search()返回MatchObject;为了提取你想要的东西,你会想要使用match.group(1)之类的东西,它会返回匹配的第一组,换句话说,就是你的正则表达式中()捕获组内的标记。

请注意,在调用.group之前,如果确实找到了匹配项,则应该检查一下,因为如果re.search不匹配并且调用None None.group将返回{{1}}会引发错误。

要解决有关重复的次要问题,建议您使用set

答案 1 :(得分:1)

你可以提出:

import re

processes = ['\\\\TEST-PC\\Process(python)\\Operations/sec',
'\\\\TEST-PC\\Process(process#2)\\Operations/sec', 
'\\\\TEST-PC\\Process(process#1)\\Operations/sec', 
'\\\\TEST-PC\\Process(process)\\Operations/sec', 
'\\\\TEST-PC\\Process(python)\\Thread Count', 
'\\\\TEST-PC\\Process(process#2)\\Thread Count',
'\\\\TEST-PC\\Process(process#1)\\Thread Count',
'\\\\TEST-PC\\Process(process)\\Thread Count']

rx = re.compile(r'Process\(([^)]+)\)')

processes_filtered = []
for process in processes:
    match = rx.search(process)
    if match is not None:
        if match.group(1) not in processes_filtered:
            processes_filtered.append(match.group(1))

print processes_filtered
# ['python', 'process#2', 'process#1', 'process']

a demo on ideone.com

或者 - 甚至更短 - 使用列表理解

rx = re.compile(r'Process\(([^)]+)\)')
processes_filtered = set([m.group(1) \
    for process in processes \
    for m in [rx.search(process)] if m])

答案 2 :(得分:0)

如果订单无关紧要,您可以这样做:

>>> import re
>>> tgt=['\\\\TEST-PC\\Process(python)\\Operations/sec',
... '\\\\TEST-PC\\Process(process#2)\\Operations/sec', 
... '\\\\TEST-PC\\Process(process#1)\\Operations/sec', 
... '\\\\TEST-PC\\Process(process)\\Operations/sec', 
... '\\\\TEST-PC\\Process(python)\\Thread Count', 
... '\\\\TEST-PC\\Process(process#2)\\Thread Count',
... '\\\\TEST-PC\\Process(process#1)\\Thread Count',
... '\\\\TEST-PC\\Process(process)\\Thread Count']
>>> {m.group(1) for m in re.finditer(r'^[^(]+\(([^)]+)\)', '\n'.join(tgt), flags=re.M)}
set(['python', 'process#2', 'process#1', 'process'])