删除重复后,python在set中保留顺序

时间:2014-08-15 13:58:24

标签: python regex set

我有一个成功的代码,它将这些单词添加到paranthesis中:但我需要删除其中的重复项。

我的代码:

import re
import collections

class Group:
    def __init__(self):
        self.members = set()
        self.text = []

with open('text1.txt') as f:
    groups = collections.defaultdict(Group)
    group_pattern = re.compile(r'^(\S+)\((.*)\)$')
    current_group = None
    for line in f:
        line = line.strip()
        m = group_pattern.match(line)
        if m:    # this is a group definition line
            group_name, group_members = m.groups()
            groups[group_name].members |= set(group_members.split(','))
            current_group = group_name
        else:
            if (current_group is not None) and (len(line) > 0):
                groups[current_group].text.append(line)

for group_name, group in groups.items():
    print "%s(%s)" % (group_name, ','.join(set(group.members)))
    print '\n'.join(group.text)
    print

我的文字档案:

 Car(skoda,audi,benz,bmw)
 The above mentioned cars are sedan type and gives long rides efficient
 ......

Car(audi,Rangerover,Hummer)
SUV cars are used for family time and spacious.

输出为:

Car(skoda,benz,bmw,Rangerover,Hummer,audi)
The above mentioned cars are sedan type and gives long rides efficient
......
SUV cars are used for family time and spacious.

预期产出:

Car(skoda,audi,benz,bmw,Rangerover,Hummer)
The above mentioned cars are sedan type and gives long rides efficient
......
SUV cars are used for family time and spacious.

此处奥迪是我删除的输出中的重复但是它最后插入而不是第二个位置。 请帮忙!答案将不胜感激!

1 个答案:

答案 0 :(得分:1)

setsunordered,因此如果您需要按订单维护使用sorted对原始列表的顺序进行排序,则您的订单将无订单:

members = ["skoda","audi","benz","bmw","audi","Rangerover","Hummer"]

print ','.join(sorted(set(members),key=lambda x: members.index(x)))
skoda,audi,benz,bmw,Rangerover,Hummer
  1. set(members)删除重复项
  2. 然后我们使用sortedlambda创建sorted list
  3. 我们使用键key=lambda x: members.index(x)进行排序,该键根据index列表中每个元素所在的members进行排序。
  4. 当所有已排序的audi根据原始members列表中的索引值放入列表中时,它将作为第二个条目返回到列表中。
  5. 因为您从一开始就使用集合,所以您将丢失订单,如果没有维护原始订单的某种结构进行排序,则无法重新获得订单。

    如果您想维护订单并使用最后一组删除重复,您可以将您的集更改为列表,以便最后一步如下:

    ','.join(sorted(set(self.members),key=lambda x: self.members.index(x)))
    

    其中self.members现在是一个列表,我们使用它的顺序将set中的项目移至原始订单。

    如果不使用保持元素原始顺序的顺序的容器,就没有办法。

    class Group:
        def __init__(self):
            self.members = []
            self.text = []
    
    with open('text1.txt') as f:
        groups = collections.defaultdict(Group)
        group_pattern = re.compile(r'^(\S+)\((.*)\)$')
        current_group = None
        for line in f:
            line = line.strip()
            m = group_pattern.match(line)
            if m:    # this is a group definition line
                group_name, group_members = m.groups()
                groups[group_name].members += filter(lambda x: x not in groups[group_name].members , group_members.split(','))
                current_group = group_name
            else:
                if (current_group is not None) and (len(line) > 0):
                    groups[current_group].text.append(line)
    
    for group_name, group in groups.items():
        print "%s(%s)" % (group_name, ','.join(group.members))
        print '\n'.join(group.text)
        print
    

    filter代码相当于[x for x in group_members.split(',') if x not in groups[group_name].members]