Question

我想根据字段的值从字典列表中获取唯一元素，并保留其他字段。

下面是我拥有的数据格式。

[ {id:"1000", text: "abc", time_stamp: "10:30"},
  {id:"1001", text: "abc", time_stamp: "10:31"},
  {id:"1002", text: "bcd", time_stamp: "10:32"} ]

我希望输出如下：（基于文本是唯一的，但保留了其他字段）

[ {id:"1000", text: "abc", time_stamp: "10:30"}, # earlier time stamp
  {id:"1002", text: "bcd", time_stamp: "10:32"} ]

在这里请注意，唯一性是基于文本的，我也想保留id和time_stamp值。这个问题与之前提出的Python - List of unique dictionaries问题不同。

我尝试过：

方法1：仅从字典中收集文本值，将其转换为列表，将其传递给集合，并获得唯一的文本值，但是我丢失了id和time_stamp。

方法2：我也尝试了一下，我遍历了字典的列表，并检查了unique_list_of_text中是否存在文本值（如果没有添加到list_of_unique_dictionary中）。但是这段代码花了很多时间，因为我正在处理一个包含350,000条记录的数据集。有更好的方法吗？方法2的代码：

def find_unique_elements(list_of_elements):
    no_of_elements = len(list_of_elements)
        unique_list_of_text = []
        unique_list_of_elements = []
        for iterator in range(0, no_of_elements):
            if not list_of_elements[iterator]['text'] in unique_list_of_text:
                unique_list_of_full_text.append(list_of_elements[iterator]['text'])
                unique_list_of_elements.append(list_of_elements[iterator])
        return unique_list_of_elements

Answer 1

您可以制作一个新的list，然后只检查该物品是否存在，

要使其更快一点，也许我会使用更好的数据结构

$ cat unique.py

id = 'id'
text = 'text'
time_stamp = 'time_stamp'

data = [ {id:"1000", text: "abc", time_stamp: "10:30"},
   {id:"1001", text: "abc", time_stamp: "10:31"},
   {id:"1002", text: "bcd", time_stamp: "10:32"} ]

keys = set()
unique_items = []
for item in data:
    if item['text'] not in keys:
        unique_items.append(item)
    keys.add(item['text'])

print(unique_items)

$ python data.py 
[{'text': 'abc', 'id': '1000', 'time_stamp': '10:30'}, {'text': 'bcd', 'id': '1002', 'time_stamp': '10:32'}]

Answer 2

您可以从反向列表中创建字典并从该字典中获取值：

id, text, time_stamp = 'id', 'text', 'timestamp'

l = [ {id:"1000", text: "abc", time_stamp: "10:30"},
  {id:"1001", text: "abc", time_stamp: "10:31"},
  {id:"1002", text: "bcd", time_stamp: "10:32"} ]

d = {i[text]: i for i in reversed(l)}
new_l = list(d.values())
print(new_l)
# [{'id': '1002', 'text': 'bcd', 'timestamp': '10:32'}, {'id': '1000', 'text': 'abc', 'timestamp': '10:30'}]

# if the order should be preserved
new_l.reverse()
print(new_l)
# [{'id': '1000', 'text': 'abc', 'timestamp': '10:30'}, {'id': '1002', 'text': 'bcd', 'timestamp': '10:32'}]

如果最终列表中的顺序很重要，请在Python 3.6及更低版本中使用OrderedDict而不是dict。

字典列表中的唯一元素有效

2 个答案: