说我在Python中有一个列表字典。我想找到所有具有共同项目的密钥组,并为每个这样的组找到相应的项目。
例如,假设项是简单整数:
dct = dict()
dct['a'] = [0, 5, 7]
dct['b'] = [1, 2, 5]
dct['c'] = [3, 2]
dct['d'] = [3]
dct['e'] = [0, 5]
小组将是:
groups = dict()
groups[0] = ['a', 'e']
groups[1] = ['b', 'c']
groups[2] = ['c', 'd']
groups[3] = ['a', 'b', 'e']
这些群体的共同要素是:
common = dict()
common[0] = [0, 5]
common[1] = [2]
common[2] = [3]
common[3] = [5]
为了解决这个问题,我相信建立一个像下面这样的矩阵是有价值的,但我不知道如何从这一点开始。是否有任何Python库可以帮助解决这类问题?
| a b c d e |
|a| x x |
|b| x x x |
|c| x x x |
|d| x x |
|e| x x x |
我试图结束@NickBurns在函数中提供的解决方案,但是我在重现解决方案时遇到了问题:
dct = { 'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2], 'd' : [3], 'e' : [0, 5]}
groups, common_items = get_groups(dct)
print 'Groups', groups
print 'Common items', common_items
我明白了:
Groups: defaultdict(<type 'list'>, {0: ['a', 'e'], 2: ['c', 'b'], 3: ['c', 'd'], 5: ['a', 'b', 'e']})
Common items: {0: None, 2: None, 3: None, 5: None}
这是函数
from collections import defaultdict
def common(query_group, dct):
""" Recursively find the common elements within groups """
if len(query_group) <= 1:
return
# Extract the elements from groups,
# Pull their original values from dct
# Get the intersection of these
first, second = set(dct[query_group[0]]), set(dct[query_group[1]])
# print(first.intersection(second))
return common(query_group[2:], dct)
def get_groups (dct):
groups = defaultdict(list)
for key, values in dct.items():
for value in values:
groups[value].append(key)
# Clean up the groups:
for key in groups.keys():
# i.e. the value is common to more than 1 group
if len(groups[key]) <= 1:
del groups[key]
# Identify common elements:
common_items = dict()
for k,v in groups.iteritems():
if len(v) > 1:
common_items[k] = common(v, dct)
return groups, common_items
答案 0 :(得分:3)
我会尝试创建第二个字典(groups
),它代表原始dct
中每个列表的交集。例如,你可以使用defaultdict这样做:
from collections import defaultdict
groups = defaultdict(list)
dct = { 'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2], 'd' : [3], 'e' : [0, 5]}
for key, values in dct.items():
for value in values:
groups[value].append(key)
for key in groups.keys():
if len(groups[key]) > 1: # i.e. the value is common to more than 1 group
print(key, groups[key])
(0, ['a', 'e'])
(2, ['c', 'b'])
(3, ['c', 'd'])
(5, ['a', 'b', 'e'])
查找公共元素有点麻烦,您需要遍历每个组并找到原始dct
的交集。也许像这样的递归例程可行:
def common(query_group, dct, have_common=[]):
""" Recursively find the common elements within groups """
if len(query_group) <= 1:
return have_common
# extract the elements from groups, and pull their original values from dct
# then get the intersection of these
first, second = set(dct[query_group[0]]), set(dct[query_group[1]])
have_common.extend(first.intersection(second))
return common(query_group[2:], dct, have_common)
for query_group in groups.values():
if len(query_group) > 1:
print(query_group, '=>', common(query_group, dct, have_common=[]))
['e', 'a'] => [0, 5]
['b', 'c'] => [2]
['d', 'c'] => [3]
['e', 'b', 'a'] => [5}]
显然它需要一些更漂亮的格式,但我认为它完成了工作。希望这会有所帮助。
答案 1 :(得分:2)
这与你要求的非常接近 - 看看它,看看它是否足够接近。
from collections import defaultdict
dct = dict()
dct['a'] = [0, 5, 7]
dct['b'] = [1, 2, 5]
dct['c'] = [3, 2]
dct['d'] = [3]
dct['e'] = [0, 5]
inverseDict = defaultdict(list)
for key in dct:
for item in dct[key]:
inverseDict[item].append(key)
for item in inverseDict.keys():
if len(inverseDict[item]) < 2:
del inverseDict[item]
for item in inverseDict:
print item, ":", inverseDict[item]
输出:
0 : ['a', 'e']
2 : ['c', 'b']
3 : ['c', 'd']
5 : ['a', 'b', 'e']
答案 2 :(得分:2)
您可以使用NetworkX库来获取矩阵(邻接矩阵)表示:
import networkx as nx
dct = { 'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2], 'd' : [3], 'e' : [0, 5]}
nodes = sorted(dct)
G = nx.Graph()
for node in nodes:
attached_nodes = dct[node]
G.add_node(node)
for nod in attached_nodes:
if 0 <= nod < len(nodes):
G.add_edge(node, nodes[nod])
print G.nodes()
print G.edges()
print G.has_edge('a','b')
print G.has_edge('b','c')
<强>输出:强>
['a', 'c', 'b', 'e', 'd']
[('a', 'a'), ('a', 'e'), ('c', 'c'), ('c', 'b'), ('c', 'd'), ('b', 'b'), ('d', 'd')]
False
True
答案 3 :(得分:1)
这是一个很大的混乱,但它的确有效。它基本上是这样构建一个数组:
| 0 1 2 3 4 5 6 7 |
+-----------------+
|a| 1 0 0 0 1 0 0 1 |
|b| 0 1 1 0 0 1 0 0 |
|c| 0 0 1 1 0 0 0 0 |
|d| 0 0 0 1 0 0 0 0 |
|e| 1 0 0 0 0 1 0 0 |
这些组是具有多个1
的唯一列。要查找组的所有常用元素,您会在组定义具有1
的位置找到具有1
的列。并用Python编写它,使用scipy的稀疏矩阵来构建上面的数组,我得到了以下内容:
import numpy as np
import scipy.sparse as sps
dct = {'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2],
'd' : [3], 'e' : [0, 5]}
keys = []
lens = []
vals = []
for key, items in dct.items():
keys.append(key)
lens.append(len(items))
vals.extend(items)
keys = np.array(keys)
lens = np.array(lens)
vals = np.array(vals)
unique_values, val_idx = np.unique(vals, return_inverse=True)
data = np.ones_like(val_idx)
indices = val_idx
indptr = np.concatenate(([0], np.cumsum(lens)))
dct_array = sps.csr_matrix((data, indices, indptr))
dct_array = dct_array.T.toarray()
mask = dct_array.sum(axis=-1) >= 2
dct_array = dct_array[mask].astype(np.bool)
unique_values = unique_values[mask]
dct_array = np.ascontiguousarray(dct_array)
dct_array = dct_array.view((np.void,
(dct_array.dtype.itemsize *
len(keys)))).ravel()
groups, grp_idx = np.unique(dct_array,
return_index=True)
groups = groups.view(np.bool).reshape(-1, len(keys))
dct_array = dct_array.view(np.bool).reshape(-1, len(keys))
for group, idx in zip(groups, grp_idx) :
print 'group {0}'.format(keys[group])
common = unique_values[np.all(np.logical_and(dct_array[idx],
dct_array) ==
dct_array[idx], axis=-1)]
print 'common {0}'.format(common)
打印出来:
group ['c' 'd']
common [3]
group ['c' 'b']
common [2]
group ['a' 'e']
common [0 5]
group ['a' 'b' 'e']
common [5]