Question

我有一个代码，用于对paranthesis中的单词进行分组，如果它在paranthesis之前有相同的名称。

例如：

car __name__(skoda,audi)
car __name__(benz)

输出：

car __name__(skoda,audi,benz)

但是当最后提供的冒号:没有输出时，

car __name__(skoda,audi):       =>no output prints with :
car __name__(benz):

我认为问题在于我的正则表达式

我的代码：

import collections
class Group:
    def __init__(self):
        self.members = []
        self.text = []
with open('out.txt','r') as f:
    groups = collections.defaultdict(Group)
    group_pattern = re.compile(r'(\S+(?: __[^__]*__)?)\((.*)\)$')
    current_group = None
    for line in f:
        line = line.strip()
        m = group_pattern.match(line)
        if m:    # this is a group definition line
            group_name, group_members = m.groups()
            groups[group_name].members.extend(group_members.split(','))
            current_group = group_name
for group_name, group in groups.items():
      print "%s(%s)" % (group_name, ','.join(group.members))

Answer 1

在正则表达式中，只需在最后添加:并通过在冒号旁边添加?使其成为可选项，以便它匹配两种类型的字符串格式。

(\S+(?: __[^__]*__)?)\((.*)\):?$

DEMO

Answer 2

问题是您在正则表达式的末尾有一个 $ 。这会强制正则表达式查找以括号结尾的模式。

你可以通过在正则表达式中删除 $ 来解决它（如果你认为会有其他尾随字符）：

(\S+(?: __[^__]*__)?)\((.*)\)

或者您可以调整正则表达式以在模式中包含冒号出现0或1次的可能性：

(\S+(?: __[^__]*__)?)\((.*)\):?$

Answer 3

您可以在没有正则表达式的情况下执行此操作：

f = [ 'car __name__(skoda,audi):\n', 'car __name__(benz):\n' ]
groups = {}
for line in f:
    v =  line.strip().split('__')
    gname, gitems = v[1], v[2]
    gitems = gitems.strip("():").split(",")
    groups[gname] = groups.get(gname, []) + gitems
print groups

使用以冒号'：'结尾的正则表达式分组

3 个答案: