我有一个元组列表(如下所示),我需要在每个元组的第一个项目上加入。因此结果将是(单词,列表(数字))元组的列表。
In [351]: word_docid_pairs
Out[351]:
[('bear', 1),
('is', 1),
('in', 1),
('gugledarc', 1),
('the', 1),
('sdpij', 2),
('emdf', 2),
('sai', 2),
('sd', 3),
('fuggle', 4),
('in', 4),
('gugledarc', 4),
('df', 4)]
答案 0 :(得分:1)
Python 2.7.3 (default, Sep 26 2012, 21:51:14)
>>> ll = [('bear', 1),
... ('is', 1),
... ('in', 1),
... ('gugledarc', 1),
... ('the', 1),
... ('sdpij', 2),
... ('emdf', 2),
... ('sai', 2),
... ('sd', 3),
... ('fuggle', 4),
... ('in', 4),
... ('gugledarc', 4),
... ('df', 4)]
>>> dd = {}
>>> for key, value in ll:
... dd.setdefault(key, []).append(value)
...
>>> dd.items()
[('sai', [2]), ('emdf', [2]), ('df', [4]), ('is', [1]), ('bear', [1]), ('gugledarc', [1, 4]), ('in', [1, 4]), ('the', [1]), ('sdpij', [2]), ('fuggle', [4]), ('sd', [3])]
正如所建议的,这是另一个使用defaultdict
的实现:
>>> from collections import defaultdict
>>> dd = defaultdict(list)
>>> for key, value in ll:
... dd[key].append(value)
...
>>> dd.items()
[('sai', [2]), ('emdf', [2]), ('df', [4]), ('is', [1]), ('bear', [1]), ('gugledarc', [1, 4]), ('in', [1, 4]), ('the', [1]), ('sdpij', [2]), ('fuggle', [4]), ('sd', [3])]