简单解决方案

Question

我想在python中创建一个Markov链。目前，当我有像＃34这样的文字时，你能否＆＃34;和＃34;你想要＆＃34;我的元组键（＆＃39;会＆＃39;，＆＃39;你＆＃39;）＆＃39;可以＆＃39;被覆盖的东西会成为（＆＃39;会＆＃39;，＆＃39;你＆＃39;）＆＃39;喜欢＆＃39;当我遍历我的文本文件时。

我正在尝试将键的每个新值添加到该键的值。 I.E.对于密钥（＆＃39;会＆＃39;，＆＃39;你＆＃39;）我希望价值显示为（＆＃39;会，＆＃39;你＆＃39;）：＆＃39;可以＆＃39;喜欢＆＃39;

这是我的代码：

def make_chains(corpus):
    """Takes an input text as a string and returns a dictionary of
    markov chains."""
    dict = {}
    for line in corpus:
        line = line.replace(',', "")
        words = line.split()
        words_copy = words
        for word in range(0, len(words_copy)):
            #print words[word], words[word + 1]
            if dict[(words[word], words[word + 1])] in dict:
                dict.update(words[word+2])
            dict[(words[word], words[word + 1])] = words[word + 2]
            #print dict
            if word == len(words_copy) - 3:
                break

    return dict

Answer 1

简单解决方案

简单的解决方案是使用collections.defaultdict：

from collections import defaultdict


def make_chains(input_list):
    """
    Takes an input text as a list of strings and returns a dictionary of markov chains.
    """
    chain = defaultdict(list)
    for line in input_list:
        line = line.replace(',', "")
        words = line.split()
        for i in range(0, len(words) - 2):
            chain[words[i], words[i + 1]].append(words[i + 2])

    return chain

有了这个，你得到：

$ print make_chains(["Would you like", "Would you could"])
defaultdict(<type 'list'>, {('Would', 'you'): ['like', 'could']})

修复原始

为了让您更好地了解代码中出现的问题，我们可以在不使用defaultdict的情况下修复原始解决方案。要做到这一点，有一些关于您的原始代码的事情。

首先，让我们看一下这句话：

words_copy = words

不按照您的想法行事，也不必做。这不会创建words的副本，只是创建一个新变量words_copy并将其指向现有的words值。因此，如果您更改words，也会更改words_copy。

你想要的是words_copy = copy.deepcopy(words)，但在这种情况下这是不必要的，因为你在迭代时没有改变words的状态。

接下来，这一行：

if dict[(words[word], words[word + 1])] in dict:
    dict.update(words[word+2])

有几个缺陷。首先，如果元组中没有元组，那么这将引发一个关键错误。这肯定会在第一次迭代时发生。其次，dict的更新方法将传递的字典添加到你要调用的字典中。你想要做的是更新该键的dict值。

所以你想要：

if (words[word], words[word + 1]) in dict:
    # Add to the existing list
    dict(words[word], words[word + 1]).append(words[word+2])
else:
    # Create a new list
    dict(words[word], words[word + 1]) = [words[word+2]]

最后，这个块是不必要的：

if word == len(words_copy) - 3:
    break

相反，只需迭代到第三个到最后一个索引，如：

for word in range(0, len(words) - 2):

完全放弃，您可以使用这些更改来修复原始版本：

def make_chains(corpus):
    """Takes an input text as a string and returns a dictionary of
    markov chains."""
    dict = {}
    for line in corpus:
        line = line.replace(',', "")
        words = line.split()
        for word in range(0, len(words) - 2):
            if (words[word], words[word + 1]) in dict:
                # Add to the existing list
                dict[(words[word], words[word + 1])].append(words[word + 2])
            else:
                # Create a new list
                dict[(words[word], words[word + 1])] = [words[word + 2]]

    return dict

希望这有帮助！

添加到字典中的键的值

1 个答案:

简单解决方案

修复原始