我想做什么: 用于映射文件中出现的每个单词的字典 到文件中紧跟该单词后面的所有单词的列表。 单词列表可以是任何顺序,应包括 例如,密钥“和”可能有列表 [“then”,“best”,“then”,“after”,...]列表 所有在文本中“和”之后的单词。
f = open(filename,'r')
s = f.read().lower()
words = s.split()#list of words in the file
dict = {}
l = []
i = 0
for word in words:
if i < (len(words)-1) and word == words[i]:
dict[word] = l.append(words[i+1])
print dict.items()
sys.exit(0)
答案 0 :(得分:0)
您可以使用defaultdict:
from collections import defaultdict
words = ["then", "best", "then", "after"]
words_dict = defaultdict(list)
for w1,w2 in zip(words, words[1:]):
words_dict[w1].append(w2)
结果:
defaultdict(<class 'list'>, {'then': ['best', 'after'], 'best': ['then']})
答案 1 :(得分:0)
collections.defaultdict对此类迭代很有帮助。为简单起见,我发明了一个字符串,而不是从文件中加载。
from collections import defaultdict
import string
x = '''This is a random string with some
string elements repeated. This is so
that, with someluck, we can solve a problem.'''
translator = str.maketrans('', '', string.punctuation)
y = x.lower().translate(translator).replace('\n', '').split(' ')
result = defaultdict(list)
for i, j in zip(y[:], y[1:]):
result[i].append(j)
# result
# defaultdict(list,
# {'a': ['random', 'problem'],
# 'can': ['solve'],
# 'elements': ['repeated'],
# 'is': ['a', 'so'],
# 'random': ['string'],
# 'repeated': ['this'],
# 'so': ['that'],
# 'solve': ['a'],
# 'some': ['string'],
# 'someluck': ['we'],
# 'string': ['with', 'elements'],
# 'that': ['with'],
# 'this': ['is', 'is'],
# 'we': ['can'],
# 'with': ['some', 'someluck']})