我有一个成功的代码,它将单词添加到括号中:但我需要删除其中的重复项。
我的代码:
import re
import collections
class Group:
def __init__(self):
self.members = []
self.text = []
with open('text1.txt') as f:
groups = collections.defaultdict(Group)
group_pattern = re.compile(r'^(\S+)\((.*)\)$')
current_group = None
for line in f:
line = line.strip()
m = group_pattern.match(line)
if m: # this is a group definition line
group_name, group_members = m.groups()
groups[group_name].members.extend(group_members.split(','))
current_group = group_name
else:
if (current_group is not None) and (len(line) > 0):
groups[current_group].text.append(line)
for group_name, group in groups.items():
print "%s(%s)" % (group_name, ','.join(group.members))
print '\n'.join(group.text)
print
我的文字档案:
Car(skoda,benz,bmw,audi)
The above mentioned cars are sedan type and gives long rides efficient
......
Car(Rangerover,Hummer,audi)
SUV cars are used for family time and spacious.
输出为:
Car(skoda,benz,bmw,audi,Rangerover,Hummer,audi,ferrari,lamborghini,porsche)
The above mentioned cars are sedan type and gives long rides efficient
......
SUV cars are used for family time and spacious.
此处 audi 是输出中的重复,如何删除括号内的重复项?
答案 0 :(得分:0)
您无需使用正则表达式来删除重复项:在members
Group
set
而不是self.members = set()
中设置self.members = []
。然后自动删除重复项。但是,您将无法使用groups[group_name].members.extend(group_members.split(','))
。相反,您必须使用|
运算符进行联合集合,或使用update
更新它们:
groups[group_name].members |= set(group_members.split(','))
或
groups[group_name].members.update(group_members.split(','))
或者,您可以在输出之前调用set
以在那里执行重复删除:
print "%s(%s)" % (group_name, ','.join(set(group.members)))
请注意,未订购set
,因此如果您需要保留与输入相同的顺序,则无法使用。相反,您需要手动过滤重复列表:
filtered_members = []
for x in groups[group_name].members:
if x not in filtered_members:
filtered_members.append(x)