将字典的长度修改为确定的值,并使键的值最高

时间:2018-12-01 17:33:46

标签: python python-3.x

伙计们,所以我正在研究此脚本,在该脚本中,我需要更新字典words,其中最常用的单词限制为limit的值。

from typing import List, Dict, TextIO, Tuple
def most_frequent(words: Dict[str, int], limit: int) -> None:

new_dict = {}
new_list = []
#I decided to create a list for easier sort

for w in words:
    new_list.append((keys, words.get(w)))
    new_list.sort(key=sorting, reverse=True)
    #key=sorting: used to sort by the value of the key from big to small 

for n_w in new_list:
    if len(new_dict) < limit:
        new_dict[n_w[0]] = n_w[1]
#this part add the words to a new dictionary up to the value of limit

words = new_dict
print(words)
#print just to check my result, I know it's supposed to return None

这是问题所在,我需要实现以下测试用例:len(words) <= limit,如果添加了最频繁的单词并得到len(words) > limit,则不添加这些单词;并且如果最后一个单词不是唯一的,并且与下一个单词具有相同的值,那么这些单词都不会添加。

>>> most_frequent({'cat': 3, 'dog': 3, 'pig': 3, 'bee': 3, 'rat': 1}, 4)
{'cat': 3, 'dog': 3, 'pig': 3, 'bee': 3}
#This one passes

>>> most_frequent({'cat': 3, 'dog': 3, 'pig': 3, 'bee': 2, 'rat': 2}, 4)
{'cat': 3, 'dog': 3, 'pig': 3}
#what I get {'cat': 3, 'dog': 3, 'pig': 3, 'bee': 2},  'bee' doesn't get added because is tied with 'rat'

>>> most_frequent({'cat': 3, 'dog': 3, 'pig': 3, 'bee': 3, 'rat': 1}, 3)  
{}
#what I get {'cat': 3, 'dog': 3, 'pig': 3}, none of them are added because there are 4 with high frequency but if they get added words > limit and it can't be

我觉得我现在使用的方法无法满足我的需求,并且我陷入了最后两种情况。我不允许使用模块,应该使用哪种方法?或者至少我可以在这里进行哪些改进以获得我所需要的?

1 个答案:

答案 0 :(得分:1)

我会做这样的事情:

def most_frequent(words, limit):
    frequencies = words.items()
    inverse = {}
    for word, frequency in frequencies:
        inverse.setdefault(frequency, []).append(word)

    result = {}
    remaining = limit
    for frequency in sorted(inverse.keys(), reverse=True):
        if len(inverse[frequency]) <= remaining:
            result.update({word: frequency for word in inverse[frequency]})
            remaining -= len(inverse[frequency])
        else:
            break

    return result


print(most_frequent({'cat': 3, 'dog': 3, 'pig': 3, 'bee': 3, 'rat': 1}, 4))
print(most_frequent({'cat': 3, 'dog': 3, 'pig': 3, 'bee': 2, 'rat': 2}, 4))
print(most_frequent({'cat': 3, 'dog': 3, 'pig': 3, 'bee': 3, 'rat': 1}, 3))

输出

{'bee': 3, 'dog': 3, 'pig': 3, 'cat': 3}
{'dog': 3, 'pig': 3, 'cat': 3}
{}

这个想法是创建一个反向字典(inverse),其中键是频率,值是具有该频率的单词列表,然后您可以以非升序对频率进行迭代,并添加一个仅在剩余预算允许的情况下,才能列出最终结果的字词列表。