我有一个非常大的字典,其中的键包含一个项目列表。我想以有效的方式将列表中至少有一个元素的所有键组合在一起。例如:
dictionary = {'group1': ['a', 'b', 'c', 'd'],
'group2': ['a', 'b', 'c', 'd', 'e'],
'group3': ['f', 'g', 'h'],
'group4': ['g', 'z']}
group_dict(dictionary)
会回馈:
dictionary = {'group1': ['a', 'b', 'c', 'd', 'e'],
'group3': ['f', 'g', 'h', 'z']}
更新
字典的“真实”结构是:
dictionary = {'group1' :{'IDs': ['a','b','c','d'], 'oldest_node': 'node_30'}, 'group2' :{'IDs': ['c','d','e'], 'oldest_node': 'node_40'}, 'group3' :{'IDs': ['h','k'], 'oldest_node': 'node_2'}, 'group4' :{'IDs': ['z','w','x','j'], 'oldest_node': 'node_6'}, 'group3' :{'IDs': ['h','z'], 'oldest_node': 'node_9'}
我想生成最具包容性的组并保持节点变量的最低值:
dictionary = {'group1' :{'IDs': ['a','b','c','d','e'], 'oldest_node': 'node_30'}, 'group3' :{'IDs': ['h','k','z','w','x','j'], 'oldest_node': 'node_2'}}
答案 0 :(得分:1)
你真的想要改变用作输入的字典吗?或者如果你的函数输出另一个字典就可以了吗?
这是一个快速而又脏的函数,它对值进行分组:
def group_dict(d):
result = {}
for k1 in d:
for k2 in d:
if k1 != k2 and set(d.get(k1)).intersection(d.get(k2)):
result[k1] = list(set(d.get(k1)).union(d.get(k2)))
return result
它应该返回:
{'group1': ['a', 'c', 'b', 'e', 'd'],
'group2': ['a', 'c', 'b', 'e', 'd'],
'group3': ['h', 'z', 'g', 'f'],
'group4': ['h', 'z', 'g', 'f']}
展开功能以删除重复项。
我正在使用内置的set
及其方法intersection
和union
。这应该是您的核心要求的关键。
在双循环(非常难看)中,比较字典中的值,如果找到交集,则将值的并集转换为列表并分配给结果字典。这真的不是一个好的解决方案,但也许它可以给你一些想法。
答案 1 :(得分:1)
以下程序解决了原始问题。可能有一个更有效的算法,但我认为这个算法相当快。
修改此代码以应对更新版本问题中更复杂的dict应该不会太难。
(我使用的是Python 2.6,所以我没有dict理解,这就是我使用生成器表达式构建dicts的原因。)
<强> merge_lists.py 强>
#! /usr/bin/env python
''' Merge lists in a dict
Lists are merged if they have any element in common,
so that in the resulting dict no list element will be
associated with more than one key.
Written by PM 2Ring 2014.11.18
From http://stackoverflow.com/q/26972204/4014959
'''
#Some test data
groups = {
'g01': ['a', 'b', 'c', 'd'],
'g02': ['a', 'b', 'c', 'd', 'e'],
'g03': ['f', 'g', 'h'],
'g04': ['g', 'j'],
#'g05': ['g', 'a'],
'g06': ['k', 'l'],
'g07': ['l', 'm'],
'g08': ['m', 'n'],
'g09': ['o', 'p'],
'g10': ['p', 'q'],
'g11': ['q', 'o'],
#'g12': ['l', 'q'],
}
def merge_lists(d):
src = dict((k, set(v)) for k, v in d.iteritems())
while True:
dest = {}
count = 0
while src:
k1, temp = src.popitem()
if temp is None:
continue
for k2, v in src.iteritems():
if v is None:
continue
if temp & v:
temp |= v
src[k2] = None
count += 1
k1 = min(k1, k2)
dest[k1] = temp
if count > 0:
#print count
#print_dict(dest)
src = dest
else:
dest = dict((k, sorted(list(v))) for k, v in dest.iteritems())
return dest
def print_dict(d):
for k in sorted(d.keys()):
print "%s: %s" % (k, d[k])
print
def main():
print_dict(groups)
print 20*'-'
dest = merge_lists(groups)
print_dict(dest)
if __name__ == '__main__':
main()
<强>输出强>
g02: ['a', 'b', 'c', 'd', 'e']
g03: ['f', 'g', 'h']
g04: ['g', 'j']
g06: ['k', 'l']
g07: ['l', 'm']
g08: ['m', 'n']
g09: ['o', 'p']
g10: ['p', 'q']
g11: ['q', 'o']
--------------------
g01: ['a', 'b', 'c', 'd', 'e']
g03: ['f', 'g', 'h', 'j']
g06: ['k', 'l', 'm', 'n']
g09: ['o', 'p', 'q']
这是一个适用于更新的dict结构的版本。
#! /usr/bin/env python
''' Merge lists in a dict
Lists are merged if they have any element in common,
so that in the resulting dict no list element will be
associated with more than one key.
The key of the merged item is selected from the sub-dict with the lowest
value of oldest_node.
Written by PM 2Ring 2014.11.21
From http://stackoverflow.com/q/26972204/4014959
'''
#Some test data
groups = {
'group1': {'IDs': ['a','b','c','d'], 'oldest_node': 'node_30'},
'group2': {'IDs': ['c','d','e'], 'oldest_node': 'node_40'},
'group3': {'IDs': ['h','k'], 'oldest_node': 'node_2'},
'group4': {'IDs': ['z','w','x','j'], 'oldest_node': 'node_6'},
'group5': {'IDs': ['h','z'], 'oldest_node': 'node_9'},
}
def merge_lists(d):
#Convert IDs to a set and oldest_node to an int
src = {}
for k, v in d.iteritems():
src[k] = {
'IDs': set(v['IDs']),
'oldest_node': int(v['oldest_node'][5:])
}
#print_dict(src)
while True:
dest = {}
count = 0
while src:
k1, temp = src.popitem()
if temp is None:
continue
for k2, v in src.iteritems():
if v is None:
continue
if temp['IDs'] & v['IDs']:
#Merge IDs
temp['IDs'] |= v['IDs']
#Determine key of merge from oldest_node
if v['oldest_node'] < temp['oldest_node']:
k1 = k2
temp['oldest_node'] = v['oldest_node']
src[k2] = None
count += 1
dest[k1] = temp
src = dest
#Exit loop if no changes occured
if count == 0:
break
else:
#print count
#print_dict(src)
pass
#Convert dict back to original form
dest = {}
for k, v in src.iteritems():
dest[k] = {
'IDs': sorted(list(v['IDs'])),
'oldest_node': 'node_%d' % v['oldest_node']
}
return dest
def print_dict(d):
for k in sorted(d.keys()):
print "%s: %s" % (k, d[k])
print
def main():
print_dict(groups)
print 20*'-'
dest = merge_lists(groups)
print_dict(dest)
if __name__ == '__main__':
main()
<强>输出强>
group1: {'IDs': ['a', 'b', 'c', 'd'], 'oldest_node': 'node_30'}
group2: {'IDs': ['c', 'd', 'e'], 'oldest_node': 'node_40'}
group3: {'IDs': ['h', 'k'], 'oldest_node': 'node_2'}
group4: {'IDs': ['z', 'w', 'x', 'j'], 'oldest_node': 'node_6'}
group5: {'IDs': ['h', 'z'], 'oldest_node': 'node_9'}
--------------------
group1: {'IDs': ['a', 'b', 'c', 'd', 'e'], 'oldest_node': 'node_30'}
group3: {'IDs': ['h', 'j', 'k', 'w', 'x', 'z'], 'oldest_node': 'node_2'}