Question

我有这个示例文字片段

headline:
        Status[apphmi]: blubb, 'Statustext1'
        Main[apphmi]: bla, 'Maintext1'Main[apphmi]: blaa, 'Maintext2'
        Popup[apphmi]: blaaa, 'Popuptext1'

我想在＆＃39;＆＃39;中提取单词，但是按照上下文（status，main，popup）进行排序。

我目前的正则表达式是（example at pythex.org）：

headline:(?:\n +Status\[apphmi\]:.* '(.*)')*(?:\n +Main\[apphmi\]:.* '(.*)')*(?:\n +Popup\[apphmi\]:.* '(.*)')*

但有了这个我只得到了＃Maintext2＆＃39;而不是两者。我不知道如何将组重复到任意数字。

Answer 1

你可以试试这个：

r"(.*?]):(?:[^']*)'([^']*)'"g

Look here 每个匹配的Group1和Group 2包含您的键值对

一旦你获得了所有的对，你就不能将第二个匹配合并为一个...你可以在这里应用一些编程来将重复的键合并为一个。

这里我使用了list的字典，如果字典中已经存在一个键，那么你应该将值附加到列表中，否则插入一个带有值的新列表的新键。

This is how it should be done (tested in python 3+)

import re

d = dict()
regex = r"(.*?]):(?:[^']*)'([^']*)'"

test_str = ("headline:        \n"
    "Status[apphmi]: blubb, 'Statustext1'\n"
    "Main[apphmi]: bla, 'Maintext1'Main[apphmi]: blaa, 'Maintext2'\n"
    "Popup[apphmi]: blaaa, 'Popuptext1'")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    if match.group(1) in d:
        d[match.group(1)].append(match.group(2))
    else:
        d[match.group(1)] = [match.group(2),]
print(d)

<强>输出：

{
'Popup[apphmi]': ['Popuptext1'], 
'Main[apphmi]': ['Maintext1', 'Maintext2'], 
'Status[apphmi]': ['Statustext1']
}

重复任意数的正则表达式组

1 个答案: