Question

考虑以下基础：

basis = "Each word of the text is converted as follows: move any consonant (or consonant cluster) that appears at the start of the word to the end, then append ay."

和以下词语：

words = "word, text, bank, tree"

如何计算＆＃34;单词＆＃34;中每个单词的PMI值？与＆＃34;基础＆＃34;中的每个单词相比，我可以使用大小为5的上下文窗口（即目标单词之前和之后的两个位置）？

我知道如何计算PMI，但我不知道如何处理上下文窗口的事实。

我计算了正常情况＆＃39; PMI值如下：

def PMI(ContingencyTable):
    (a,b,c,d,N) = ContingencyTable
    # avoid log(0)
    a += 1
    b += 1
    c += 1
    d += 1
    N += 4

    R_1 = a + b
    C_1 = a + c

    return log(float(a)/(float(R_1)*float(C_1))*float(N),2)

Answer 1

我对PMI进行了一些搜索，看起来像那里有重型包装，＆＃34;窗口＆＃34;包括

在PMI中，＃34;相互＆＃34;似乎是指两个不同单词的联合概率，因此你需要在问题陈述中坚定这个想法

我接受了一个较小的问题，就是在你的问题陈述中生成短窗口列表主要是为了我自己的练习

def wndw(wrd_l, m_l, pre, post):
    """
    returns a list of all lists of sequential words in input wrd_l
    that are within range -pre and +post of any word in wrd_l that matches
    a word in m_l

    wrd_l      = list of words
    m_l        = list of words to match on
    pre, post  = ints giving range of indices to include in window size      
    """
    wndw_l = list()
    for i, w in enumerate(wrd_l):
        if w in m_l:
           wndw_l.append([wrd_l[i + k] for k in range(-pre, post + 1)
                                           if 0 <= (i + k ) < len(wrd_l)])
    return wndw_l

basis = """Each word of the text is converted as follows: move any
             consonant (or consonant cluster) that appears at the start
             of the word to the end, then append ay."""

words = "word, text, bank, tree"

print(*wndw(basis.split(), [x.strip() for x in words.split(',')], 2, 2),
      sep="\n")
['Each', 'word', 'of', 'the']
['of', 'the', 'text', 'is', 'converted']
['of', 'the', 'word', 'to', 'the']

使用给定的上下文窗口计算PMI值

1 个答案: