Question

我正在使用以下代码来获取一组文件中的10个最不常用的单词：

import os

data_directory = "/pubmed/"

file_list = os.listdir(data_directory)

string_freq = {}

for file in file_list:
    f = open(data_directory + file, 'r')
    ftext = f.read()
    new_text = ftext.replace("\n", "")
    string_list = new_text.split(" ")
    for word in string_list:
        if word in string_freq:
            string_freq[word] += 1
        else:
            string_freq[word]  = 1
f.close()

for word in sorted (string_frequency, key = string_frequency.get, reverse=False)[:10]:
    print(word, string_freq[word])

现在，这是事情：我得到一个10个单词的列表，但它们的频率计数均为1。结果看起来像这样：

Evaluation 1
reviews 1
decision 1
ankle 1
knee 1
postreduction 1
shoulder 1
nursemaid's 1
elbows 1
Thermal 1

如何跳过具有相同频率的单词，使结果看起来像：评估1，其他单词2，第三单词3，第四单词4等？除了os，string或random的标准库之外，我真的不想使用任何其他库。

Answer 1

您可以这样做，例如：

container.Register(
    Component
      .For<IMyBaseInterface>()
      .ImplementedBy<JustAClass>());

container.Register(
    Component
      .For<IYetAnotherInterface >()
      .ImplementedBy<JustAClass>());

输出

IYetAnotherInterface

Answer 2

您可以还原键值关联，然后对结果字典进行排序。这将覆盖重复的频率：

>>> sorted({v: k for k, v in string_freq.items()}.items())

获得独特的词频

2 个答案: