Question

我刚刚在python中创建了一个bigram模型，如下所示：

class BiGramModel:
    def __init__(self, trainfiles):
        wordlist = re.findall(r'\b\w+\b', trainfiles)
        wordlist = Counter(wordlist)
        for word, count in wordlist.items():
            if count == 1:
                trainfiles = re.sub(r'\b{}\b'.format(word), '<unk>', trainfiles)
        print trainfiles
        m = re.findall(r'(?=(\b\w+\b \S+))', trainfiles)
        print m

    def logprob(self,context, word): #Need to write code for this

def main():
    bi = BiGramModel("STOP to be or not to be STOP")
    bi.score("STOP to be or not to be STOP")

main()

现在，我需要计算每个输入的上下文和单词的对数概率，logprob（自我，上下文，单词）。我知道计算对数概率的公式是log(count(event,context)+1)/(count(context)+V。

例如：

for the sentence: `STOP to be <unk> <unk> to be STOP`

bigrams printed out: ['STOP to', 'to be', 'be <unk>', 'to be', 'be STOP']

现在，如果我执行logprob（STOP，STOP），则：上面列表中的count（STOP，STOP）为0，V =句子中不同单词的数量，为5，并且计数（STOP，STOP）= 0，因此log prob = log（（0 + 1）/（0 + 5））= log1 / 5

我无法弄清楚如何为此编写单独的函数，以便从上面的 init 函数中获取bigrams。我坚持这个!!

Answer 1

我无法弄清楚如何为此编写单独的函数这样它就可以从上面的init函数中获取bigrams。

__init__是您班级的构造函数。您的类创建对象（它＆＃34;实例化＆＃34;它们），__init__定义创建这些对象时会发生什么。

一个类的想法是它为包含一些东西的对象设置蓝图（通常称为＆＃39; state＆＃39;）并且它也做了一些事情（＆＃39; behavior＆＃39;）。使对象包含事物的方法通常是在创建对象时将这些事物设置为属性。

您可以设置属性的一种方法是使用您在构造函数定义中创建的变量self。例如，您可以在__init__中保存所有这些内容，以便您的类方法可以通过以下方式访问它们：

def __init__(self, trainfiles):
        wordlist = re.findall(r'\b\w+\b', trainfiles)
        self.wordlist = Counter(wordlist)
        for word, count in self.wordlist.items():
            if count == 1:
                self.trainfiles = re.sub(r'\b{}\b'.format(word), '<unk>', trainfiles)
        print self.trainfiles
        self.m = re.findall(r'(?=(\b\w+\b \S+))', self.trainfiles)

这意味着在__init__之后调用的任何函数（这意味着您为类定义的每个方法，实际上都是这样）将可以访问这些内容。然后你可以做，例如：

def logprob(self,context, word):
    print self.m 
    print self.trainfiles

但是，您正在寻找的设计理念并不完全清楚，因此将每个对象视为一个状态和行为的集合是非常重要的。

真正的问题是：BiGramModel真正做了什么，每个BiGramModel应该做什么？

补充问题也可以告诉第一个答案：你是创造了很多这样的东西还是其中一个？

当你可以回答这个问题时，你会知道要保存什么作为对象的属性以及要写入的方法。

更具体的问题，你提到你知道你想要的公式：

log(count(event,context)+1)/(count(context)+V

但是，不清楚这些变量将来自何处。你怎么算数？你传递了吗？什么＆＃39; event？

如何使用python查找bigrams的日志概率？

1 个答案: