Question

我必须在列表理解中过滤重复的值

 with open(file_name, 'rU') as input_file:          
        result = [unique for unique in [process(line) for line in input_file] if **unique_not_in_generated_list**]

可以使用哪种表达式代替 unique_not_in_generated_list ？

python 2.7

Answer 1

这是一个用于删除重复项的衬垫，同时保持行的顺序排列。使用collections.OrderedDict's fromkeys

result = list(OrderedDict.fromkeys(process(line) for line in input_file))

Answer 2

您可以像这样使用sorted和itertools.groupby

from itertools import groupby
print [unique for unique, _ in groupby(sorted(process(line) for line in input_file))]

注意：这不保留数据的顺序，但保证在不使用集合的情况下生成唯一的项目。

Answer 3

只需使用生成器表达式和set()：

with open(file_name, 'rU') as input_file:          
    result = list(set(process(line) for line in input_file))

如果您需要保留订单，您仍然可以使用set。我也牺牲了列表理解而支持可读性：

result = list()
seen = set()

with open(file_name, 'rU') as input_file:
    for line in input_file:
        unique = process(line)
        if unique not in seen:
            seen.add(unique)
            result.append(unique)

如果你坚持使用列表理解并保持秩序，你可以通过使用None s的一次性列表来做到这一点，但是这种方法违背了目的并且 O（N ^ 2）< /强>

result = list() with open(file_name, 'rU') as input_file: [result.append(x) for process(x) in input_file if process(x) not in result]

如何在不使用set的情况下在python中的列表推导中过滤重复值

3 个答案: