我是python的新手!我创建了一个代码,它可以成功打开我的文本文件并对我的100个单词列表进行排序。然后我将这些列在标有stimuli_words
的列表中,该列表中没有重复的单词,全部是小写等。
但是我现在想把这个列表转换成一个字典,其中的键在我的单词列表中都是3个字母的结尾,而值是与这些结尾相对应的单词。
例如:去,招聘......',但我只想要其中有超过40个单词对应最后两个字符的单词。到目前为止,我有这段代码:
from collections import defaultdict
fq = defaultdict( int )
for w in stimuli_list:
fq[w] += 1
print fq
然而,它只是用我的单词返回一本字典,以及它们发生了多少次,这显然是一次。例如,':1,'招聘':1,'驾驶':1。
真的很感激一些帮助!!谢谢!!
答案 0 :(得分:1)
你可以这样做:
dictionary = {}
words = ['going', 'hiring', 'driving', 'letter', 'better', ...] # your list or words
# Creating words dictionary
for word in words:
dictionary.setdefault(word[-3:], []).append(word)
# Removing lists that contain less than 40 words:
for key, value in dictionary.copy().items():
if len(value) < 40:
del dictionary[key]
print(dictionary)
输出:
{ # Only lists that are longer than 40 words
'ing': ['going', 'hiring', 'driving', ...],
'ter': ['letter', 'better', ...],
...
}
答案 1 :(得分:0)
由于你在计算单词(因为你的单词是单词),每个单词只能获得1个计数。
您可以创建最后3个字符的键(并使用Counter
代替):
import collections
wordlist = ["driving","hunting","fishing","drive","a"]
endings = collections.Counter(x[-3:] for x in wordlist)
print(endings)
结果:
Counter({'ing': 3, 'a': 1, 'ive': 1})
答案 2 :(得分:0)
创建DemoData:
import random
# seed the same for any run
random.seed(10)
# base lists for demo data
prae = ["help","read","muck","truck","sleep"]
post= ["ing", "biothign", "press"]
# lots of data
parts = [ x+str(y)+z for x in prae for z in post for y in range(100,1000,100)]
# shuffle and take on ever 15th
random.shuffle(parts)
stimuli_list = parts[::120]
从stimuli_list
# create key with empty lists
dic = dict(("".join(e[len(e)-3:]),[]) for e in stimuli_list)
# process data and if fitting, fill list
for d in dic:
fitting = [x for x in parts if x.endswith(d)] # adapt to only fit 2 last chars
if len(fitting) > 5: # adapt this to have at least n in it
dic[d] = fitting[:]
for d in [x for x in dic if not dic[x]]: # remove keys with empty lists
dic.remove(d)
print()
print(dic)
输出:
{'ess': ['help400press', 'sleep100press', 'sleep600press', 'help100press', 'muck400press', 'muck900press', 'muck500press', 'help800press', 'muck100press', 'read300press', 'sleep400press', 'muck800press', 'read600press', 'help200press', 'truck600press', 'truck300press', 'read700press', 'help900press', 'truck400press', 'sleep200press', 'read500press', 'help600press', 'truck900press', 'truck800press', 'muck200press', 'truck100press', 'sleep700press', 'sleep500press', 'sleep900press', 'truck200press', 'help700press', 'muck300press', 'sleep800press', 'muck700press', 'sleep300press', 'help500press', 'truck700press', 'read400press', 'read100press', 'muck600press', 'read900press', 'read200press', 'help300press', 'truck500press', 'read800press']
, 'ign': ['truck200biothign', 'muck500biothign', 'help800biothign', 'muck700biothign', 'help600biothign', 'truck300biothign', 'read200biothign', 'help500biothign', 'read900biothign', 'read700biothign', 'truck400biothign', 'help300biothign', 'read400biothign', 'truck500biothign', 'read800biothign', 'help700biothign', 'help400biothign', 'sleep600biothign', 'sleep500biothign', 'muck300biothign', 'truck700biothign', 'help200biothign', 'sleep300biothign', 'muck100biothign', 'sleep800biothign', 'muck200biothign', 'sleep400biothign', 'truck100biothign', 'muck800biothign', 'read500biothign', 'truck900biothign', 'muck600biothign', 'truck800biothign', 'sleep100biothign', 'read300biothign', 'read100biothign', 'help900biothign', 'truck600biothign', 'help100biothign', 'read600biothign', 'muck400biothign', 'muck900biothign', 'sleep900biothign', 'sleep200biothign', 'sleep700biothign']
}