Question

我正在寻找一种方法从Python列表中删除重复的条目，但有一个转折;最终列表必须区分大小写，并且首选大写单词。

例如，在cup和Cup之间，我只需要保留Cup而不是cup。与其他建议首先使用lower()的常见解决方案不同，我更倾向于在此处维护字符串的情况，特别是我更倾向于使用大写字母保留一个大写字母。小写..

同样，我试图将此列表转为： [Hello, hello, world, world, poland, Poland]

进入这个：

[Hello, world, Poland]

我该怎么做？

提前致谢。

Answer 1

这不会保留words的顺序，但会生成一个＆＃34; unique＆＃34;的列表。偏好大写字母的词。

In [34]: words = ['Hello', 'hello', 'world', 'world', 'poland', 'Poland', ]

In [35]: wordset = set(words)

In [36]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[36]: ['world', 'Poland', 'Hello']

如果您希望保留words中显示的订单，那么您可以使用collections.OrderedDict：

In [43]: wordset = collections.OrderedDict()

In [44]: wordset = collections.OrderedDict.fromkeys(words)

In [46]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[46]: ['Hello', 'world', 'Poland']

Answer 2

使用set跟踪看到的字词：

def uniq(words):
    seen = set()
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible. (3.3+)
        if l in seen:
            continue
        seen.add(l)
        yield word

用法：

>>> list(uniq(['Hello', 'hello', 'world', 'world', 'Poland', 'poland']))
['Hello', 'world', 'Poland']

<强>更新

以前的版本不会考虑大写优先于小写。在更新版本中，我使用min作为@TheSoundDefense。

import collections

def uniq(words):
    seen = collections.OrderedDict()  # Use {} if the order is not important.
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible (3.3+)
        seen[l] = min(word, seen.get(l, word))
    return seen.values()

Answer 3

由于大写字母是＆＃34;较小＆＃34;比较中的小写字母，我认为你可以这样做：

orig_list = ["Hello", "hello", "world", "world", "Poland", "poland"]
unique_list = []
for word in orig_list:
  for i in range(len(unique_list)):
    if unique_list[i].lower() == word.lower():
      unique_list[i] = min(word, unique_list[i])
      break
  else:
    unique_list.append(word)

min将优先选择之前使用大写字母的字词。

Answer 4

这里有一些更好的答案，但希望是简单，不同和有用的东西。这段代码满足你的测试条件，连续的匹配词对，但是在任何更复杂的事情上都会失败;例如非顺序对，非对或非字符串。更复杂的事情，我会采取不同的方法。

p1 = ['Hello', 'hello', 'world', 'world', 'Poland', 'poland']
p2 = ['hello', 'Hello', 'world', 'world', 'Poland', 'Poland']

def pref_upper(p):
    q = []
    a = 0
    b = 1

    for x in range(len(p) /2):
            if p[a][0].isupper() and p[b][0].isupper():
                    q.append(p[a])
            if p[a][0].isupper() and p[b][0].islower():
                    q.append(p[a])
            if p[a][0].islower() and p[b][0].isupper():
                    q.append(p[b])
            if p[a][0].islower() and p[b][0].islower():
                    q.append(p[b])
            a +=2
            b +=2
    return q

print pref_upper(p1)
print pref_upper(p2)

如何在保留区分大小写的同时消除Python中的重复列表条目？

4 个答案: