所以,我正在做作业,但我坚持这一部分。我有一本字典,其中以字符串元组为键和相应的值。现在,我必须使用paras方法,通过删除布朗语料库文档中出现次数少于8次的键来过滤字典
我到处都在寻找它,却找不到任何伪代码。
[{('love', 'sex'): '6.77',
('tiger', 'cat'): '7.35',
('tiger', 'tiger'): '10.00',
('book', 'paper'): '7.46',
('computer', 'keyboard'): '7.62',
('computer', 'internet'): '7.58',
('plane', 'car'): '5.77',
('train', 'car'): '6.31',
('telephone', 'communication'): '7.50',
('television', 'radio'): '6.77',
('media', 'radio'): '7.42',
('drug', 'abuse'): '6.85',
.
.
.
所以我要用这本词典做的是,我应该删除那些其标记(单词对)不是按字母顺序排列的键,以及其中至少一个单词具有文档的单词对(关键字)。棕色语料库中的频率少于8
答案 0 :(得分:0)
我不知道在这种情况下document
是什么,所以这个答案可能有缺陷。
输入:
mylist = [{('love', 'sex'): '6.77',
('tiger', 'cat'): '7.35',
('tiger', 'tiger'): '10.00',
('book', 'paper'): '7.46',
('computer', 'keyboard'): '7.62',
('computer', 'internet'): '7.58',
('computer', 'car'): '7.58',
('computer', 'plane'): '7.58',
('computer', 'train'): '7.58',
('computer', 'television'): '7.58',
('computer', 'radio'): '7.58',
('computer', 'tiger'): '7.58',
('computer', 'test1'): '7.58',
('computer', 'test2'): '7.58',
('tiger', 'tz1'): '7.58',
('tiger', 'tz2'): '7.58',
('tiger', 'tz3'): '7.58',
('tiger', 'tz4'): '7.58',
('tiger', 'tz5'): '7.58',
('tiger', 'tz6'): '7.58',
('tiger', 'tz7'): '7.58',
('tiger', 'tz8'): '7.58',
('plane', 'car'): '5.77',
('train', 'car'): '6.31',
('telephone', 'communication'): '7.50',
('television', 'radio'): '6.77',
('media', 'radio'): '7.42',
('drug', 'abuse'): '6.85'}]
解决方案: 请注意,解决方案必须遍历字典两次(尽管第二个循环通常只会遍历字典的一部分)。我还将列表中的每个字典当作自己的东西来处理,因此您可能需要移动一些语句。
# This will be the keys we want to remove
removable_keys = set()
# This will be the number of times we see a key part (left or right)
occurences = dict()
# For each dictionary in our list
for dic in mylist:
# For each key in that dictionary
for key in dic:
# If the key is not in alphabetical order
if list(key) != sorted(list(key)):
# We will remove that key
removable_keys.add(key)
# Else this is a valid key
else:
# Increment the number of times we have seen this key
left, right = key
occurences[left] = 1 if left not in occurences else occurences[left] + 1
occurences[right] = 1 if right not in occurences else occurences[right] + 1
# No we need to look for keys that had less than 8 occurences.
for key in dic.keys() - removable_keys:
left, right = key
if occurences[left] < 8 or occurences[right] < 8:
removable_keys.add(key)
# Finally remove all those keys from our dict
for key in removable_keys:
del dic[key]
print(dic)
输出:
{('tiger', 'tiger'): '10.00', ('computer', 'tiger'): '7.58'}