Question

我正在编写一个函数来查找系统上发生的进程的名称。我接受这样的数组：

['\\\\TEST-PC\\Process(python)\\Operations/sec',
'\\\\TEST-PC\\Process(process#2)\\Operations/sec', 
'\\\\TEST-PC\\Process(process#1)\\Operations/sec', 
'\\\\TEST-PC\\Process(process)\\Operations/sec', 
'\\\\TEST-PC\\Process(python)\\Thread Count', 
'\\\\TEST-PC\\Process(process#2)\\Thread Count',
'\\\\TEST-PC\\Process(process#1)\\Thread Count',
'\\\\TEST-PC\\Process(process)\\Thread Count'....etc....]

我想在这样的数组中输出每个进程的名称：

['python','process#2','process#1','process']

（请注意，如果一个进程在原始数组中出现多次，我不希望在输出数组中出现重复）

这是我到目前为止所做的：

def count_no_of_processes(row_to_check):
    #Ignore first entry
    to_search= row_to_check[1:]
    processes=[]
    for number in range(0,len(header_to_search)):
        search = re.search(r"\(([^)]+)\)", header_to_search[number])
        processes.append(search
    print processes

但是这并没有在"<_sre.SRE_Match object at 0x10c1fw321>"列表中显示"processes"所列的流程列表。

我做错了什么？

我还没有进入舞台或检查processes列表中的重复内容，但如果有任何建议，我将不胜感激，因为我不熟悉使用正则表达式。

Answer 1

提醒re.search()返回MatchObject;为了提取你想要的东西，你会想要使用match.group(1)之类的东西，它会返回匹配的第一组，换句话说，就是你的正则表达式中()捕获组内的标记。

请注意，在调用.group之前，如果确实找到了匹配项，则应该检查一下，因为如果re.search不匹配并且调用None None.group将返回{{1}}会引发错误。

要解决有关重复的次要问题，建议您使用set。

Answer 2

你可以提出：

import re

processes = ['\\\\TEST-PC\\Process(python)\\Operations/sec',
'\\\\TEST-PC\\Process(process#2)\\Operations/sec', 
'\\\\TEST-PC\\Process(process#1)\\Operations/sec', 
'\\\\TEST-PC\\Process(process)\\Operations/sec', 
'\\\\TEST-PC\\Process(python)\\Thread Count', 
'\\\\TEST-PC\\Process(process#2)\\Thread Count',
'\\\\TEST-PC\\Process(process#1)\\Thread Count',
'\\\\TEST-PC\\Process(process)\\Thread Count']

rx = re.compile(r'Process\(([^)]+)\)')

processes_filtered = []
for process in processes:
    match = rx.search(process)
    if match is not None:
        if match.group(1) not in processes_filtered:
            processes_filtered.append(match.group(1))

print processes_filtered
# ['python', 'process#2', 'process#1', 'process']

见a demo on ideone.com。

或者 - 甚至更短 - 使用列表理解：

rx = re.compile(r'Process\(([^)]+)\)')
processes_filtered = set([m.group(1) \
    for process in processes \
    for m in [rx.search(process)] if m])

Answer 3

如果订单无关紧要，您可以这样做：

>>> import re
>>> tgt=['\\\\TEST-PC\\Process(python)\\Operations/sec',
... '\\\\TEST-PC\\Process(process#2)\\Operations/sec', 
... '\\\\TEST-PC\\Process(process#1)\\Operations/sec', 
... '\\\\TEST-PC\\Process(process)\\Operations/sec', 
... '\\\\TEST-PC\\Process(python)\\Thread Count', 
... '\\\\TEST-PC\\Process(process#2)\\Thread Count',
... '\\\\TEST-PC\\Process(process#1)\\Thread Count',
... '\\\\TEST-PC\\Process(process)\\Thread Count']
>>> {m.group(1) for m in re.finditer(r'^[^(]+\(([^)]+)\)', '\n'.join(tgt), flags=re.M)}
set(['python', 'process#2', 'process#1', 'process'])

Python：使用Regex创建一组非重复条目

3 个答案: