Question

我想做什么：用于映射文件中出现的每个单词的字典到文件中紧跟该单词后面的所有单词的列表。单词列表可以是任何顺序，应包括例如，密钥“和”可能有列表 [“then”，“best”，“then”，“after”，...]列表所有在文本中“和”之后的单词。

  f = open(filename,'r')

  s = f.read().lower()

  words = s.split()#list of words in the file

  dict = {}

  l = []

  i = 0

  for word in words:

      if i < (len(words)-1) and word == words[i]:

          dict[word] = l.append(words[i+1])  

  print dict.items()

  sys.exit(0)

Answer 1

您可以使用defaultdict：

from collections import defaultdict

words = ["then", "best", "then", "after"]

words_dict = defaultdict(list)
for w1,w2 in zip(words, words[1:]):
    words_dict[w1].append(w2)

结果：

defaultdict(<class 'list'>, {'then': ['best', 'after'], 'best': ['then']})

Answer 2

collections.defaultdict对此类迭代很有帮助。为简单起见，我发明了一个字符串，而不是从文件中加载。

from collections import defaultdict
import string

x = '''This is a random string with some 
string elements repeated. This is so 
that, with someluck, we can solve a problem.'''

translator = str.maketrans('', '', string.punctuation)
y = x.lower().translate(translator).replace('\n', '').split(' ')

result = defaultdict(list)

for i, j in zip(y[:], y[1:]):
    result[i].append(j)

# result
# defaultdict(list,
#             {'a': ['random', 'problem'],
#              'can': ['solve'],
#              'elements': ['repeated'],
#              'is': ['a', 'so'],
#              'random': ['string'],
#              'repeated': ['this'],
#              'so': ['that'],
#              'solve': ['a'],
#              'some': ['string'],
#              'someluck': ['we'],
#              'string': ['with', 'elements'],
#              'that': ['with'],
#              'this': ['is', 'is'],
#              'we': ['can'],
#              'with': ['some', 'someluck']})

如何将每个单词映射到python中跟随它的单词列表？

2 个答案: