假设我有一个表示节点网络的数组,其中连接节点被描述为“从节点”和“到节点”:
a = array([(1, 2), (2, 3), (3, 4), (4, 5), (2, 6), (6, 7), (7, 8), (2, 9),
(9, 10), (10, 11), (2, 12), (12, 13), (13, 14), (13, 15), (14, 16)],
dtype=[('fnode', '<i4'), ('tnode', '<i4')])
a['fnode']
array([ 1, 2, 3, 4, 2, 6, 7, 2, 9, 10, 2, 12, 13, 13, 14])
a['tnode']
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
如何最好地将'节点'组合到它们共享相同'节点'的列表中?
我遵循这种格式:
#from-node to-nodes
1 [2]
2 [3,6,9,12]
3 [4]
4 [5]
5 []
6 [7]
7 [8]
8 []
9 [10]
10 [11]
11 []
12 [13]
13 [14,15]
14 [16]
15 []
16 []
修改
要明确的是,我希望没有'to-nodes'的'from-nodes'(例如节点8)与空列表相关联。
答案 0 :(得分:5)
使用collections.defaultdict
:
d = defaultdict(list)
map( lambda (k,v) : d[k].append(v), a)
print d
>> Out[40]: defaultdict(<type 'list'>, {1: [2], 2: [3, 6, 9, 12], 3: [4]
: [7], 7: [8], 9: [10], 10: [11], 12: [13], 13: [14, 15], 14: [16]})
答案 1 :(得分:3)
如果你已经使用NumPy而不是列表,我想你的目标是加快速度。在这种情况下,我建议使用Pandas库。
>>> pd.DataFrame(a).groupby('fnode').apply(lambda x: x['tnode'].values)
fnode
1 [2]
2 [3, 6, 9, 12]
3 [4]
4 [5]
6 [7]
7 [8]
9 [10]
10 [11]
12 [13]
13 [14, 15]
14 [16]
dtype: object
大阵列的时间信息:
In [32]: a = array([(1, 2), (2, 3), (3, 4), (4, 5), (2, 6), (6, 7), (7, 8),
(2, 9), (9, 10), (10, 11), (2, 12), (12, 13), (13, 14),
(13, 15), (14, 16)] * 100000,
dtype=[('fnode', '<i4'), ('tnode', '<i4')])
In [33]: %%timeit
pd.DataFrame(a).groupby('fnode').apply(lambda x: x['tnode'].values)
10 loops, best of 3: 102 ms per loop
In [34]: %%timeit
d = defaultdict(list)
map( lambda (k,v) : d[k].append(v), a)
1 loops, best of 3: 5.76 s per loop
In [35]: %%timeit
[(k, list(v)) for k,v in groupby(a, lambda (x, y): x)]
1 loops, best of 3: 9.02 s per loop
答案 2 :(得分:1)
您可以使用itertools.groupby
。
定义数组:
A = np.array([(1, 2), (2, 3), (3, 4), (4, 5), (2, 6), (6, 7), (7, 8), (2, 9),
(9, 10), (10, 11), (2, 12), (12, 13), (13, 14), (13, 15), (14, 16)],
dtype=[('fnode', '<i4'), ('tnode', '<i4')])
对它进行排序:
A = sorted(A, key=lambda (a,b): a)
然后对它进行分组(我将生成器转换为列表,以便您可以看到它的结果):
In [18]: [(k, list(v)) for k,v in groupby(A, lambda (a,b): a)]
Out[18]:
[(1, [(1, 2)]),
(2, [(2, 3), (2, 6), (2, 9), (2, 12)]),
(3, [(3, 4)]),
(4, [(4, 5)]),
(6, [(6, 7)]),
(7, [(7, 8)]),
(9, [(9, 10)]),
(10, [(10, 11)]),
(12, [(12, 13)]),
(13, [(13, 14), (13, 15)]),
(14, [(14, 16)])]
然后,您可以进行所需的任何后期处理。
例如,您在此示例中更喜欢[(k, map(lambda (a,b): b, v)) for k,v ...
之类的内容。
(请注意,对数组进行排序非常重要。groupby
的操作方式与POSIX uniq
相同,因为它只会组合相邻的元素。要组合所有元素,按照与分组相同的密钥排序。)
答案 3 :(得分:0)
这有点啰嗦,但它有效(获得空列表):
np.array((np.unique(np.hstack((a['tnode'],a['fnode']))),np.array([a['tnode'][x].tolist() for x in [np.where(a['fnode']==y) for y in np.unique(np.hstack((a['tnode'],a['fnode'])))]]))).T
array([[1, [2]],
[2, [3, 6, 9, 12]],
[3, [4]],
[4, [5]],
[5, []],
[6, [7]],
[7, [8]],
[8, []],
[9, [10]],
[10, [11]],
[11, []],
[12, [13]],
[13, [14, 15]],
[14, [16]],
[15, []],
[16, []]], dtype=object)
以(可能)更易读的形式:
uniq_nodes = np.unique(np.hstack((a['tnode'],a['fnode']))) # list nodes in network
to_nodes_loc = [np.where(a['fnode']==y) for y in uniq_nodes] # find where nodes are in tonodes array
to_nodes = [a['tnode'][x].tolist() for x in to_nodes_loc] # get to_nodes
np.array((uniq_nodes,np.array(to_nodes))).T # combine into array