Question

免责声明：我刚刚开始学习Python

我有一个函数可以计算单词出现在文本文件中的次数，并将单词设置为键，将计数设置为值，并将其存储在字典“book_index”中。这是我的代码：

alice = open('location of the file', 'r', encoding = "cp1252")

def book_index(alice):
    """Alice is a file reference"""
    """Alice is opened, nothing else is done"""
    worddict = {}
    line = 0

    for ln in alice:
        words = ln.split()
        for wd in words:
            if wd not in worddict:
                worddict[wd] = 1 #if wd is not in worddict, increase the count for that word to 1
            else:
                worddict[wd] = worddict[wd] + 1 #if wd IS in worddict, increase the count for that word BY 1
        line = line + 1
    return(worddict)

我需要将该字典“从里到外”并将计数作为键，并将任何出现x次的单词作为值。例如：[2，'hello'，'hi']其中'hello'和'hi'在文本文件中出现两次。

我是否需要遍历现有字典或再次遍历文本文件？

Answer 1

由于字典是值映射的关键，因此无法通过值有效地过滤。因此，您必须循环遍历字典中的所有元素，以获取值具有某些特定值的键。

这将打印字典d中的所有键，其值等于searchValue：

for k, v in d.items():
    if v == searchValue:
        print(k)

关于book_index功能，请注意您可以使用内置Counter来计算内容。 Counter本质上是一个字典，它以count作为其值，并自动处理不存在的键。使用计数器，您的代码将如下所示：

from collections import Counter
def book_index(alice):
    worddict = Counter()
    for ln in alice:
        worddict.update(ln.split())
    return worddict

或者，正如roippi建议作为对另一个答案的评论，只是worddict = Counter(word for line in alice for word in line.split())。

Answer 2

我个人建议在这里使用Counter对象，这是专门为这种应用程序而设计的。例如：

from collections import Counter
counter = Counter()
for ln in alice:
    counter.update(ln.split())

这将为您提供相关词典，如果您再阅读Counter docs

您可以检索最常见的结果。

这可能不适用于您提出的问题的所有情况，但它比第一次手动迭代更好。

如果你真的想“翻转”这本词典，你可以按照以下方式做点什么：

matching_values = lambda value: (word for word, freq in wordict.items() if freq==value)
{value: matching_values for value in set(worddict.values())}

上述解决方案与其他解决方案相比具有一些优势，因为延迟执行意味着对于非常稀疏的情况，您不希望对此函数进行大量调用，或者只是发现哪个值实际上具有相应的条目，会更快，因为它实际上不会遍历字典。

也就是说，这个解决方案通常比vanilla迭代解决方案更糟糕，因为每当你需要一个新的数字时它会主动遍历字典。

没有根本不同，但我不想在这里复制其他答案。

Answer 3

循环显示现有字典，以下是使用dict.setdefault()的示例：

countdict = {}
for k, v in worddict.items():
    countdict.setdefault(v, []).append(k)

或collections.defaultdict：

import collections
countdict = collections.defaultdict(list)
for k, v in worddict.items():
    countdict[v].append(k)

我个人更喜欢setdefault()方法，因为结果是常规词典。

示例：

>>> worddict = {"hello": 2, "hi": 2, "world": 4}
>>> countdict = {}
>>> for k, v in worddict.items():
...     countdict.setdefault(v, []).append(k)
...
>>> countdict
{2: ['hi', 'hello'], 4: ['world']}

正如其他一些答案中所述，您可以使用collections.Counter显着缩短book_index功能。

Answer 4

没有重复：

word_by_count_dict = {value: key for key, value in worddict.iteritems()}

请参阅PEP 274以了解Python的字典理解：http://www.python.org/dev/peps/pep-0274/

有重复项：

import collections

words_by_count_dict = collections.defaultdict(list)
for key, value in worddict.iteritems():
    words_by_count_dict[value].append(key)

这样：

words_by_count_dict[2] = ["hello", "hi"]

如何翻译字典“由内而外”

4 个答案: