我需要定义一个函数group_dictionaries,它将获取一个字典列表并返回一个字典列表,其中包含键列表的EACH键的相同值。 “孤独”词典将被删除。
以下是一个例子:
my_list=[
{'id':'id1', 'key1':value_x, 'key2': value_y, 'key3':value_z},
{'id':'id3', 'key2 :value_u, 'key3': value_v},
{'id':'id2', 'key1':value_x, 'key3':value_z, 'key4': value_t},
{'id':'id4', 'key1':value_w, 'key2':value_s, 'key3':value_v}
]
group_dictionary(my_list, list_of_keys=['key1', 'key3'])
#result: the only dictionaries that have key1 AND key3 in common are:
[
{'id':'id1', 'key1':value_x, 'key2': value_y, 'key3':value_z, 'group':0},
{'id':'id2', 'key1':value_x, 'key3':value_z, 'key4': value_t, 'group':0}
]
group_dictionary(my_list, list_of_keys=['key3'])
#result the dictionaries that have key3 in common are divided in two groups
#of different values: group 0 has value_z and group1 has value_v
[
{'id':'id1', 'key1':value_x, 'key2': value_y, 'key3':value_z, 'group':0},
{'id':'id2', 'key1':value_x, 'key3':value_z, 'key4': value_t, 'group':0},
{'id':'id3', 'key2 :value_u, 'key3': value_v, 'group':1},
{'id':'id4', 'key1':value_w, 'key2':value_s, 'key3':value_v, 'group':1}
]
如你所见:
我担心运行时间;实际列表包含平均每个35键的80,000个词典。算法的复杂性可能是n²(80,000²)。欢迎使用代码中的任何优化。
答案 0 :(得分:1)
我相信这会有效,它是用Python3编写的,我没有对它进行优化,但如果它不够快,它可能是一个很好的起点。
<app name="name online game here" path="123123123" icon="test" />
输出:
list_of_dicts = [
{'id':'id1', 'key1':'value_x', 'key2': 'value_y', 'key3':'value_z'},
{'id':'id3', 'key2' :'value_u', 'key3': 'value_v'},
{'id':'id2', 'key1':'value_x', 'key3':'value_z', 'key4': 'value_t'},
{'id':'id4', 'key1':'value_w', 'key2':'value_s', 'key3':'value_v'}
]
# Since we can't have objects as keys, make the values we're looking for into a string, and use that as the key.
def make_value_key(d, list_of_keys):
res = ""
for k in list_of_keys:
res += str(d[k])
return res
def group_dictionary(list_of_dicts, list_of_keys):
group_vals = {}
current_max_group = 0
dicts_to_remove = []
for i,d in enumerate(list_of_dicts):
# If dict doesn't have all keys mark for removal.
if not all(k in d for k in list_of_keys):
dicts_to_remove.append(i)
else:
value_key = make_value_key(d, list_of_keys)
# If value key exists assign group otherwise make new group.
if value_key in group_vals:
d['group'] = group_vals[value_key]
else:
group_vals[value_key] = current_max_group
d['group'] = current_max_group
current_max_group += 1
list_of_dicts = [i for j, i in enumerate(list_of_dicts) if j not in dicts_to_remove]
return list_of_dicts
list_of_keys=['key1','key3']
print(group_dictionary(list_of_dicts, list_of_keys))
print()
list_of_keys=['key3']
print(group_dictionary(list_of_dicts, list_of_keys))
优化1:
而不是迭代所有键来检查它们是否存在,而不是在创建value-key时返回一个空字符串,这将标记要删除的字典:
[{'key3': 'value_z', 'key1': 'value_x', 'group': 0, 'key2': 'value_y', 'id': 'id1'},
{'key3': 'value_z', 'key1': 'value_x', 'key4': 'value_t', 'group': 0, 'id': 'id2'},
{'key3': 'value_v', 'key1': 'value_w', 'group': 1, 'key2': 'value_s', 'id': 'id4'}]
[{'key3': 'value_z', 'key1': 'value_x', 'group': 0, 'key2': 'value_y', 'id': 'id1'},
{'group': 1, 'key3': 'value_v', 'key2': 'value_u', 'id': 'id3'},
{'key3': 'value_z', 'key1': 'value_x', 'key4': 'value_t', 'group': 0, 'id': 'id2'},
{'key3': 'value_v', 'key1': 'value_w', 'group': 1, 'key2': 'value_s', 'id': 'id4'}]
群组必须大于1:
这使用第二个dict来跟踪组大小,然后检查组是否小于2以标记它们以便删除。
def make_value_key(d, list_of_keys):
res = ""
for k in list_of_keys:
if not k in d:
return ""
res += str(d[k])
return res
def group_dictionary(list_of_dicts, list_of_keys):
group_vals = {}
current_max_group = 0
dicts_to_remove = []
for i,d in enumerate(list_of_dicts):
value_key = make_value_key(d, list_of_keys)
if value_key == "":
dicts_to_remove.append(i)
continue
if value_key in group_vals:
d['group'] = group_vals[value_key]
else:
group_vals[value_key] = current_max_group
d['group'] = current_max_group
current_max_group += 1
list_of_dicts = [i for j, i in enumerate(list_of_dicts) if j not in dicts_to_remove]
return list_of_dicts
输出:
def make_value_key(d, list_of_keys):
res = ""
for k in list_of_keys:
if not k in d:
return ""
res += str(d[k])
return res
def group_dictionary(list_of_dicts, list_of_keys):
group_vals = {}
group_count = {}
current_max_group = 0
indices_to_remove = []
for i,d in enumerate(list_of_dicts):
value_key = make_value_key(d, list_of_keys)
if value_key == "":
indices_to_remove.append(i)
continue
if value_key in group_vals:
d['group'] = group_vals[value_key]
# Second group member seen, remove from count dict.
group_count.pop(d['group'], None)
else:
group_vals[value_key] = current_max_group
d['group'] = current_max_group
# First time seen, add to count dict.
group_count[current_max_group] = i
current_max_group += 1
indices_to_remove.extend(group_count.values())
return [i for j, i in enumerate(list_of_dicts) if j not in indices_to_remove]
优化2:
你可以从[{'key2': 'value_y', 'group': 0, 'id': 'id1', 'key1': 'value_x', 'key3': 'value_z'},
{'key4': 'value_t', 'group': 0, 'id': 'id2', 'key1': 'value_x', 'key3': 'value_z'}]
[{'key2': 'value_y', 'group': 0, 'id': 'id1', 'key1': 'value_x', 'key3': 'value_z'}, {'group': 1, 'id': 'id3', 'key2': 'value_u', 'key3': 'value_v'}, {'key4': 'value_t', 'group': 0, 'id': 'id2', 'key1': 'value_x', 'key3': 'value_z'}, {'key2': 'value_s', 'group': 1, 'id': 'id4', 'key1': 'value_w', 'key3': 'value_v'}]
(循环通过dict列表一次计算,一次删除)到O(n^2)
(循环遍历dict列表并循环排序已删除的索引):
O(n*m log m)
答案 1 :(得分:1)
这很简单;首先,您需要一些方法来轻松序列化dict中的相关数据。我将使用这种(非常简单的)方法,但根据数据的复杂性,您可能需要提出更强大的功能:
def serialize(d, keys):
return ','.join([d[key] for key in keys])
然后,您只需将所有这些序列化值存储在列表中。列表中值的索引是您的组的ID。
def group_dictionary(dicts, keys):
groups = []
result = []
for d in dicts:
# skip over dictionaries that don't have all keys
if any(key not in d for key in keys):
continue
# get the serialized data
serialized_data = serialize(d, keys)
# if we've encountered a new set of data, create a new group!
if serialized_data not in groups:
groups.append(serialized_data)
# augment the dictionary with the group id
d['group'] = groups.index(serialized_data)
# and add it to the list of returned dictionaries
result.append(d)
return result