我有一个包含数十万个单词的Python列表。单词按文本中的顺序显示。
我希望创建一个与包含该单词的字符串相关联的每个单词的字典,其中包含出现在其前后的2个(例如)单词。
例如列表:“This”“is”“an”“example”“sentence”
应该成为字典:
"This" = "This is an"
"is" = "This is an example"
"an" = "This is an example sentence"
"example" = "is an example sentence"
"sentence" = "an example sentence"
类似的东西:
WordsInContext = Dict()
ContextSize = 2
wIndex = 0
for w in Words:
WordsInContext.update(w = ' '.join(Words[wIndex-ContextSize:wIndex+ContextSize]))
wIndex = wIndex + 1
这可能包含一些语法错误,但即使这些错误得到纠正,我也相信这样做会非常低效。
请问有人建议采用更优化的方法吗?
答案 0 :(得分:4)
我的建议:
words = ["This", "is", "an", "example", "sentence" ]
dict = {}
// insert 2 items at front/back to avoid
// additional conditions in the for loop
words.insert(0, None)
words.insert(0, None)
words.append(None)
words.append(None)
for i in range(len(words)-4):
dict[ words[i+2] ] = [w for w in words[i:i+5] if w]
答案 1 :(得分:0)
>>> from itertools import count
>>> words = ["This", "is", "an", "example", "sentence" ]
>>> context_size = 2
>>> dict((word,words[max(i-context_size,0):j]) for word,i,j in zip(words,count(0),count(context_size+1)))
{'This': ['This', 'is', 'an'], 'is': ['This', 'is', 'an', 'example'], 'sentence': ['an', 'example', 'sentence'], 'example': ['is', 'an', 'example', 'sentence'], 'an': ['This', 'is', 'an', 'example', 'sentence']}
在python 2.7+
或3.x
{word:words[max(i-context_size,0):j] for word,i,j in zip(words,count(0),count(context_size+1))}