python正则表达式删除括号内的重复项

时间:2014-08-14 18:34:14

标签: python regex

我有一个成功的代码,它将单词添加到括号中:但我需要删除其中的重复项。

我的代码:

import re
import collections

class Group:
    def __init__(self):
        self.members = []
        self.text = []

with open('text1.txt') as f:
    groups = collections.defaultdict(Group)
    group_pattern = re.compile(r'^(\S+)\((.*)\)$')
    current_group = None

    for line in f:
        line = line.strip()
        m = group_pattern.match(line)
        if m:    # this is a group definition line
            group_name, group_members = m.groups()
            groups[group_name].members.extend(group_members.split(','))
            current_group = group_name
        else:
            if (current_group is not None) and (len(line) > 0):
                groups[current_group].text.append(line)

for group_name, group in groups.items():
    print "%s(%s)" % (group_name, ','.join(group.members))
    print '\n'.join(group.text)
    print

我的文字档案:

 Car(skoda,benz,bmw,audi)
 The above mentioned cars are sedan type and gives long rides efficient
 ......

Car(Rangerover,Hummer,audi)
SUV cars are used for family time and spacious.

输出为:

Car(skoda,benz,bmw,audi,Rangerover,Hummer,audi,ferrari,lamborghini,porsche)
The above mentioned cars are sedan type and gives long rides efficient
......
SUV cars are used for family time and spacious.

此处 audi 是输出中的重复,如何删除括号内的重复项?

1 个答案:

答案 0 :(得分:0)

您无需使用正则表达式来删除重复项:在members Group set而不是self.members = set()中设置self.members = []。然后自动删除重复项。但是,您将无法使用groups[group_name].members.extend(group_members.split(','))。相反,您必须使用|运算符进行联合集合,或使用update更新它们:

groups[group_name].members |= set(group_members.split(','))

groups[group_name].members.update(group_members.split(','))

或者,您可以在输出之前调用set以在那里执行重复删除:

print "%s(%s)" % (group_name, ','.join(set(group.members)))

请注意,未订购set,因此如果您需要保留与输入相同的顺序,则无法使用。相反,您需要手动过滤重复列表:

filtered_members = []
for x in groups[group_name].members:
    if x not in filtered_members:
        filtered_members.append(x)