nltk的一致性如何运作?

时间:2016-02-18 22:25:51

标签: python python-3.x nltk

我在this problem工作时发现的某些内容concordance不希望在Text开头显示上下文:

>>> from nltk.book import *
>>> text3.concordance("beginning",lines=1)
Displaying 1 of 5 matches:
                                   beginning God created the heaven and the ear

注意没有"在"在上面的输出中。但concordance的结尾Text没有问题。

>>> text3.concordance("coffin",lines=1)
Displaying 1 of 1 matches:
 embalmed him , and he was put in a coffin in Egypt .

有趣的是,如果您指定width更好的事情(默认width=79,我相信)。

>>> text3.concordance("beginning",width=11, lines=1)
Displaying 1 of 5 matches:
In the beginning 

有人对此有解释吗? nltk.org的文档说:

  

使用指定的上下文窗口打印单词的一致性。单词匹配不区分大小写。

1 个答案:

答案 0 :(得分:0)

考虑此功能concordance,我已根据源代码HEREclass ConcordanceIndex()的原始源代码进行了修改。

def print_concordance(self, word, width=35, lines=25):
    """
    Print a concordance for ``word`` with the specified context window.

    :param word: The target word
    :type word: str
    :param width: The width of each line, in characters (default=80)
    :type width: int
    :param lines: The number of lines to display (default=25)
    :type lines: int
    """
    #print ("inside:")
    #print (width)
    half_width = (width - len(word) - 2) // 2
    #print (half_width)
    context = width // 4 # approx number of words of context
    #print ("Context:"+str(context))
    offsets = self.offsets(word)
    if offsets:
        lines = min(lines, len(offsets))
        print("Displaying %s of %s matches:" % (lines, len(offsets)))
        for i in offsets:
            #print(i)
            if lines <= 0:
                break
            left = (' ' * half_width +
                    ' '.join(self._tokens[i-context:i])) #This is were you have to concentrate 
            #print(i-context)
            #print(self._tokens[i-context:i])
            right = ' '.join(self._tokens[i+1:i+context])
            left = left[-half_width:]
            right = right[:half_width]
            print(left, self._tokens[i], right)
            lines -= 1
    else:
        print("No matches")

从评论区域,您可以观察到价值何时成为&#39; -ve&#39;然后在控制台上没有打印出来。

你可以[&#39; + ve&#39;:&#39; -ve&#39;]但不能[&#39; -ve&#39;:&#39; + ve&#39;] 。因此没有任何东西被打印出来,换句话说就是打印空字符串。

self._tokens[i-context:i]初始值为宽度增加时为正值时,它趋于负值,因此没有输出。