Question

假设我有一个名为word_counter_dictionary的词典，它以{'word' : number}的形式计算文档中的单词数量。例如，单词“secondly”出现一次，因此键/值对将是{'secondly' : 1}。我想制作一个倒置列表，这样数字就会成为键，而单词将成为这些键的值，这样我就可以绘制前25个最常用的单词。我看到setdefault()函数可能派上用场的地方，但是无论我不能使用它，因为到目前为止我在课堂上我们只涉及get()。

inverted_dictionary = {}
for key in word_counter_dictionary:
    new_key = word_counter_dictionary[key]
    inverted_dictionary[new_key] = word_counter_dictionary.get(new_key, '') + str(key)   
    inverted_dictionary

到目前为止，使用上面的方法，它可以正常工作，直到它到达另一个具有相同值的单词。例如，单词"saves"在文档中也会出现一次，因此Python会添加新的键/值对。但是它会使用新对删除{1 : 'secondly'}，以便只有{1 : 'saves'}在字典中。

所以，最重要的是，我的目标是在这个名为inverted_dictionary的新词典中获取所有单词及其各自的重复次数。

Answer 1

defaultdict非常适合此

word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
from collections import defaultdict

d = defaultdict(list)
for key, value in word_counter_dictionary.iteritems():
    d[value].append(key)

print(d)

输出：

defaultdict(<type 'list'>, {1: ['first'], 2: ['second', 'fourth'], 3: ['third']})

Answer 2

您可以做的是使用相同的键转换单词列表中的值：

word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}

inverted_dictionary = {}
for key in word_counter_dictionary:
    new_key = word_counter_dictionary[key]
    if new_key in inverted_dictionary:
        inverted_dictionary[new_key].append(str(key))
    else:
        inverted_dictionary[new_key] = [str(key)]

print inverted_dictionary

>>> {1: ['first'], 2: ['second', 'fourth'], 3: ['third']}

Answer 3

Python dicts不允许重复键，因此您不能使用简单的字典来存储具有相同键的多个元素（在您的情况下为1）。对于您的示例，我宁愿使用list作为倒置字典的值，并在该列表中存储共享出现次数的字词，例如：

inverted_dictionary = {}
for key in word_counter_dictionary:
    new_key = word_counter_dictionary[key]
    if new_key in inverted_dictionary:
        inverted_dictionary[new_key].append(key)
    else:
        inverted_dictionary[new_key] = [key]

为了获得25个最重复的单词，你应该遍历inverted_dictionary中的（排序）键并存储单词：

common_words = []
for key in sorted(inverted_dictionary.keys(), reverse=True):
    if len(common_words) < 25:
        common_words.extend(inverted_dictionary[key])
    else: 
        break

common_words = common_words[:25] # In case there are more than 25 words

Answer 4

这是一个不会“反转”字典的版本：

>>> import operator
>>> A = {'a':10, 'b':843, 'c': 39, 'd': 10}
>>> B = sorted(A.iteritems(), key=operator.itemgetter(1), reverse=True)
>>> B
[('b', 843), ('c', 39), ('a', 10), ('d', 10)]

相反，它会创建一个按值排序，从最高到最低的列表。

要获得前25名，只需将其分割：B[:25]。

这是将键和值分开的一种方法（在将它们放入元组列表之后）：

>>> [x[0] for x in B]
['b', 'c', 'a', 'd']
>>> [x[1] for x in B]
[843, 39, 10, 10]

或

>>> C, D = zip(*B)
>>> C
('b', 'c', 'a', 'd')
>>> D
(843, 39, 10, 10)

请注意，如果您只想提取密钥或值（而不是两者），那么您应该提前完成。这只是如何处理元组列表的示例。

Answer 5

为了获取某些数据集的最大元素，反转字典可能不是最佳数据结构。

将项目放在排序列表中（例如假设您想要获得两个最常用的单词）：

word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
counter_word_list = sorted((count, word) for word, count in word_counter_dictionary.items())

结果：

>>> print(counter_word_list[-2:])
[(2, 'second'), (3, 'third')]

或者使用Python附带的电池（在这种情况下为heapq.nlargest）：

import heapq, operator
print(heapq.nlargest(2, word_counter_dictionary.items(), key=operator.itemgetter(1)))

结果：

[('third', 3), ('second', 2)]

当某些原始值相同时，反转字典

5 个答案: