我有一个成功的代码,它将这些单词添加到paranthesis中:但我需要删除其中的重复项。
我的代码:
import re
import collections
class Group:
def __init__(self):
self.members = set()
self.text = []
with open('text1.txt') as f:
groups = collections.defaultdict(Group)
group_pattern = re.compile(r'^(\S+)\((.*)\)$')
current_group = None
for line in f:
line = line.strip()
m = group_pattern.match(line)
if m: # this is a group definition line
group_name, group_members = m.groups()
groups[group_name].members |= set(group_members.split(','))
current_group = group_name
else:
if (current_group is not None) and (len(line) > 0):
groups[current_group].text.append(line)
for group_name, group in groups.items():
print "%s(%s)" % (group_name, ','.join(set(group.members)))
print '\n'.join(group.text)
print
我的文字档案:
Car(skoda,audi,benz,bmw)
The above mentioned cars are sedan type and gives long rides efficient
......
Car(audi,Rangerover,Hummer)
SUV cars are used for family time and spacious.
输出为:
Car(skoda,benz,bmw,Rangerover,Hummer,audi)
The above mentioned cars are sedan type and gives long rides efficient
......
SUV cars are used for family time and spacious.
预期产出:
Car(skoda,audi,benz,bmw,Rangerover,Hummer)
The above mentioned cars are sedan type and gives long rides efficient
......
SUV cars are used for family time and spacious.
此处奥迪是我删除的输出中的重复,但是它最后插入而不是第二个位置。 请帮忙!答案将不胜感激!
答案 0 :(得分:1)
sets
为unordered
,因此如果您需要按订单维护使用sorted
对原始列表的顺序进行排序,则您的订单将无订单:
members = ["skoda","audi","benz","bmw","audi","Rangerover","Hummer"]
print ','.join(sorted(set(members),key=lambda x: members.index(x)))
skoda,audi,benz,bmw,Rangerover,Hummer
set(members)
删除重复项sorted
和lambda
创建sorted list
key=lambda x: members.index(x)
进行排序,该键根据index
列表中每个元素所在的members
进行排序。audi
根据原始members
列表中的索引值放入列表中时,它将作为第二个条目返回到列表中。因为您从一开始就使用集合,所以您将丢失订单,如果没有维护原始订单的某种结构进行排序,则无法重新获得订单。
如果您想维护订单并使用最后一组删除重复,您可以将您的集更改为列表,以便最后一步如下:
','.join(sorted(set(self.members),key=lambda x: self.members.index(x)))
其中self.members
现在是一个列表,我们使用它的顺序将set
中的项目移至原始订单。
如果不使用保持元素原始顺序的顺序的容器,就没有办法。
class Group:
def __init__(self):
self.members = []
self.text = []
with open('text1.txt') as f:
groups = collections.defaultdict(Group)
group_pattern = re.compile(r'^(\S+)\((.*)\)$')
current_group = None
for line in f:
line = line.strip()
m = group_pattern.match(line)
if m: # this is a group definition line
group_name, group_members = m.groups()
groups[group_name].members += filter(lambda x: x not in groups[group_name].members , group_members.split(','))
current_group = group_name
else:
if (current_group is not None) and (len(line) > 0):
groups[current_group].text.append(line)
for group_name, group in groups.items():
print "%s(%s)" % (group_name, ','.join(group.members))
print '\n'.join(group.text)
print
filter
代码相当于[x for x in group_members.split(',') if x not in groups[group_name].members]