我目前正在尝试在Python中实现Markov Word Generator I found on the internet(因为我不了解Coffee),所以我可以为其添加更多功能。 但是,我很难理解Coffee代码的确切功能,并且有些困惑。 我尝试“翻译”的代码可以在here中找到。
到目前为止,我已经使用了generate()
和ngrams()
方法。理论上我也有continuation()
(我的continue()
版本,因为它在Python中是关键字),但是由于它取决于tree()
中节点的实现,因此它可能不是最终的。
我最努力的部分是方法tree()
。我知道它应该做什么,但是对于如何实现它一无所知。这是我到目前为止所得到的。很多事情可能是愚蠢的,一旦翻译完成后我会考虑重新查看所有内容,以查看是否可以改善实现。
import random
from functools import reduce
class MarkovNode:
def __init__(self, name = "", count = 0, frequency = 0.0, continuations = dict()):
self.continuations = continuations
self.count = count
self.frequency = frequency
self.name = name
class Markov:
# Creates a new Markov chain from the given array of sequences
# (to collectively use as a corpus) and value for n (to use as the Markov order).
# sequences may be empty. n must be an integer no lower than 0.
# Feel free to directly access and modify an object's .sequences and .n.
def __init__(self, corpus="", n=2, maxLength=20):
self.sequences = corpus
self.n = n
self.maxLength = maxLength
# Generates a new pseudorandom sequence generated by the Markov chain and
# returns it as an array. Its length will be truncated to @maxLength if necessary.
def generate(self):
result = ""
def currentState():
# Returns at most the last n elements of result.
return result[max(0, len(result)-self.n):len(result)]
def nextElement():
element = self.continuation(currentState())
print("Added {} to chain".format(element))
return element
#print(type(nextElement))
while len(result) < self.maxLength and nextElement != None:
result = "{}{}".format(result, nextElement)
return result
# Returns in a list the n-grams that went into making the Markov chain
# Note that the size of the n-grams will always be one greater than the
# Markov order - if a Markov chain was created with n=2, this method
# will return an array of 3-grams.
def ngrams(self):
def ngramsFromSequence(word, n):
if n < 1 or n > len(word):
return []
else:
return [word[i:i+n] for i in range(len(word)-n)]
ngrams = []
for word in self.sequences:
ngrams.append(ngramsFromSequence(word, self.n+1))
return ngrams
# Builds a probability tree and returns the node of the given sequence, or
# the root node if no sequence is specified. Returns null if the given
# sequence is not represented in the tree.
#
# Each node has a "count", "frequency" and "continuations" property.
# For example:
# node = myMarkov.tree("abc")
# c = node.continuations["d"].count
# f = node.continuations["d"].frequency
# c would be the number of times that "d" came after "abc" in the original corpus.
# f would be the probability that the letter to follow "abc" is "d."
def tree(self, sequence = ""):
n_grams = self.ngrams()
root = MarkovNode(count = len(n_grams), frequency = 1.0)
# Build the tree and supply each node with its count property.
for n_gram in n_grams:
node = root
for element in n_gram:
# If we need to create a new node, do so.
if element not in node.continuations:
node.continuations[element] = MarkovNode(name = element)
node = node.continuations[element]
node.count += 1
# Recursively descend through the tree we just built and give each node its
# frequency property.
def normalize(node):
for child in node.continuations:
child.frequency = child.count/node.count
normalize(child)
normalize(root)
if type(sequence) is str:
for sym in ",.!#;:":
sequence.replace(sym, '')
seq = sequence.split()
# Navigate to the desired sequence.
def find(root, sequence):
print("Looking for: {}".format(sequence))
if root is not None:
if root.name is sequence:
print("Returned: {}".format(root.name))
return root
for child in root.continuations:
print(child.name)
node = find(child, sequence)
if node is not None:
return node
return None
find(root, sequence)
# Uses the Markov chain to pick the next element to come after sequence.
# Returns null if there are no possible continuations.
def continuation(self, sequence):
node = self.tree(sequence)
if node:
target = random()
sum = 0
for child in node.continuations:
sum += child.frequency
if sum >= target:
print("Chose {}".format(child.name))
return child.name
else:
print(node)
return None
# Either the node was None or it had no continuations.
编辑:更新了代码 该实现当前无法正常运行。我可能在这里缺少明显的东西。