我有一个4克的列表,我希望用以下字符填充字典对象/ shevle对象:
['I','go','to','work']
['I','go','there','often']
['it','is','nice','being']
['I','live','in','NY']
['I','go','to','work']
所以我们有类似的东西:
four_grams['I']['go']['to']['work']=1
并且任何新遇到的4-gram用其四个键填充,值为1,如果再次遇到它,则其值递增。
答案 0 :(得分:1)
您可以创建一个帮助方法,将元素一次一个地插入到嵌套字典中,每次检查是否已存在所需的子字典:
dict = {}
def insert(fourgram):
d = dict # reference
for el in fourgram[0:-1]: # elements 1-3 if fourgram has 4 elements
if el not in d: d[el] = {} # create new, empty dict
d = d[el] # move into next level dict
if fourgram[-1] in d: d[fourgram[-1]] += 1 # increment existing, or...
else: d[fourgram[-1]] = 1 # ...create as 1 first time
您可以使用以下数据填充它:
insert(['I','go','to','work'])
insert(['I','go','there','often'])
insert(['it','is','nice','being'])
insert(['I','live','in','NY'])
insert(['I','go','to','work'])
之后,您可以根据需要索引dict
:
print( dict['I']['go']['to']['work'] ); # prints 2
print( dict['I']['go']['there']['often'] ); # prints 1
print( dict['it']['is']['nice']['being'] ); # prints 1
print( dict['I']['live']['in']['NY'] ); # prints 1
答案 1 :(得分:1)
你可以这样做:
import shelve
from collections import defaultdict
db = shelve.open('/tmp/db')
grams = [
['I','go','to','work'],
['I','go','there','often'],
['it','is','nice','being'],
['I','live','in','NY'],
['I','go','to','work'],
]
for gram in grams:
path = db.get(gram[0], defaultdict(int))
def f(path, word):
if not word in path:
path[word] = defaultdict(int)
return path[word]
reduce(f, gram[1:-1], path)[gram[-1]] += 1
db[gram[0]] = path
print db
db.close()