如何将每个单词映射到python中跟随它的单词列表?

时间:2018-01-25 23:43:12

标签: python file dictionary

我想做什么: 用于映射文件中出现的每个单词的字典 到文件中紧跟该单词后面的所有单词的列表。 单词列表可以是任何顺序,应包括 例如,密钥“和”可能有列表 [“then”,“best”,“then”,“after”,...]列表 所有在文本中“和”之后的单词。

  f = open(filename,'r')

  s = f.read().lower()

  words = s.split()#list of words in the file

  dict = {}

  l = []

  i = 0

  for word in words:

      if i < (len(words)-1) and word == words[i]:

          dict[word] = l.append(words[i+1])  

  print dict.items()

  sys.exit(0)

2 个答案:

答案 0 :(得分:0)

您可以使用defaultdict:

from collections import defaultdict

words = ["then", "best", "then", "after"]

words_dict = defaultdict(list)
for w1,w2 in zip(words, words[1:]):
    words_dict[w1].append(w2)

结果:

defaultdict(<class 'list'>, {'then': ['best', 'after'], 'best': ['then']})

答案 1 :(得分:0)

collections.defaultdict对此类迭代很有帮助。为简单起见,我发明了一个字符串,而不是从文件中加载。

from collections import defaultdict
import string

x = '''This is a random string with some 
string elements repeated. This is so 
that, with someluck, we can solve a problem.'''

translator = str.maketrans('', '', string.punctuation)
y = x.lower().translate(translator).replace('\n', '').split(' ')

result = defaultdict(list)

for i, j in zip(y[:], y[1:]):
    result[i].append(j)

# result
# defaultdict(list,
#             {'a': ['random', 'problem'],
#              'can': ['solve'],
#              'elements': ['repeated'],
#              'is': ['a', 'so'],
#              'random': ['string'],
#              'repeated': ['this'],
#              'so': ['that'],
#              'solve': ['a'],
#              'some': ['string'],
#              'someluck': ['we'],
#              'string': ['with', 'elements'],
#              'that': ['with'],
#              'this': ['is', 'is'],
#              'we': ['can'],
#              'with': ['some', 'someluck']})