Question

我目前正在尝试在Python中实现Markov Word Generator I found on the internet（因为我不了解Coffee），所以我可以为其添加更多功能。但是，我很难理解Coffee代码的确切功能，并且有些困惑。我尝试“翻译”的代码可以在here中找到。

到目前为止，我已经使用了generate()和ngrams()方法。理论上我也有continuation()（我的continue()版本，因为它在Python中是关键字），但是由于它取决于tree()中节点的实现，因此它可能不是最终的。

我最努力的部分是方法tree()。我知道它应该做什么，但是对于如何实现它一无所知。这是我到目前为止所得到的。很多事情可能是愚蠢的，一旦翻译完成后我会考虑重新查看所有内容，以查看是否可以改善实现。

import random
from functools import reduce

class MarkovNode:
    def __init__(self, name = "", count = 0, frequency = 0.0, continuations = dict()):
        self.continuations = continuations
        self.count = count
        self.frequency = frequency
        self.name = name

class Markov:
    # Creates a new Markov chain from the given array of sequences
    # (to collectively use as a corpus) and value for n (to use as the Markov order).
    # sequences may be empty. n must be an integer no lower than 0.
    # Feel free to directly access and modify an object's .sequences and .n.
    def __init__(self, corpus="", n=2, maxLength=20):
        self.sequences = corpus
        self.n = n
        self.maxLength = maxLength

    # Generates a new pseudorandom sequence generated by the Markov chain and
    # returns it as an array.  Its length will be truncated to @maxLength if necessary.
    def generate(self):
        result = ""
        def currentState():
            # Returns at most the last n elements of result.
            return result[max(0, len(result)-self.n):len(result)]
        def nextElement():
            element = self.continuation(currentState())
            print("Added {} to chain".format(element))
            return element
        #print(type(nextElement))
        while len(result) < self.maxLength and nextElement != None:
            result = "{}{}".format(result, nextElement)
        return result

    # Returns in a list the n-grams that went into making the Markov chain
    # Note that the size of the n-grams will always be one greater than the
    # Markov order - if a Markov chain was created with n=2, this method
    # will return an array of 3-grams.
    def ngrams(self):
        def ngramsFromSequence(word, n):
            if n < 1 or n > len(word):
                return []
            else:
                return [word[i:i+n] for i in range(len(word)-n)]

        ngrams = []
        for word in self.sequences:
            ngrams.append(ngramsFromSequence(word, self.n+1))
        return ngrams

    # Builds a probability tree and returns the node of the given sequence, or
    # the root node if no sequence is specified.  Returns null if the given
    # sequence is not represented in the tree.
    #
    # Each node has a "count", "frequency" and "continuations" property.
    # For example:
    #   node = myMarkov.tree("abc")
    #   c = node.continuations["d"].count
    #   f = node.continuations["d"].frequency
    # c would be the number of times that "d" came after "abc" in the original corpus.
    # f would be the probability that the letter to follow "abc" is "d."
    def tree(self, sequence = ""):
        n_grams = self.ngrams()
        root = MarkovNode(count = len(n_grams), frequency = 1.0)

        # Build the tree and supply each node with its count property.
        for n_gram in n_grams:
            node = root
            for element in n_gram:
                    # If we need to create a new node, do so.
                    if element not in node.continuations:
                        node.continuations[element] = MarkovNode(name = element)
                    node = node.continuations[element]
                    node.count += 1

        # Recursively descend through the tree we just built and give each node its
        # frequency property.
        def normalize(node):
            for child in node.continuations:
                child.frequency = child.count/node.count
                normalize(child)

        normalize(root)

        if type(sequence) is str:
            for sym in ",.!#;:":
                sequence.replace(sym, '')
            seq = sequence.split()

        # Navigate to the desired sequence.
        def find(root, sequence):
            print("Looking for: {}".format(sequence))
            if root is not None:
                if root.name is sequence:
                    print("Returned: {}".format(root.name))
                    return root
                for child in root.continuations:
                    print(child.name)
                    node = find(child, sequence)
                    if node is not None:
                        return node
            return None

        find(root, sequence)


    # Uses the Markov chain to pick the next element to come after sequence.
    # Returns null if there are no possible continuations.
    def continuation(self, sequence):
        node = self.tree(sequence)
        if node:
            target = random()
            sum = 0
            for child in node.continuations:
                sum += child.frequency
                if sum >= target:
                    print("Chose {}".format(child.name))
                    return child.name
        else:
            print(node)
        return None
        # Either the node was None or it had no continuations.

编辑：更新了代码 该实现当前无法正常运行。我可能在这里缺少明显的东西。

了解一些Coffee代码-将其转换为Python

0 个答案: