如何获得不在CFG语法词典中的单词?

时间:2017-09-27 18:10:37

标签: python nltk

如何让程序返回语法未涵盖的单词列表?例如,请考虑以下代码:

    import nltk
    # Define the cfg grammar.
    grammar = nltk.CFG.fromstring("""
    S -> NP VP
    VP -> V NP
    NP -> det N | N
    V -> "eats" | "drinks"
    N -> "President" | "apple"
    det -> "The" | "a" | "an"
    """)
    sentence = "The President Michel eats banana"

    # Load the grammar into the ChartParser.
    cp = nltk.ChartParser(grammar)

    # Generate and print the parse from the grammar given the sentence tokens.
    for tree in cp.parse(sentence.split()):
        print(tree)

它只显示错误消息: ValueError:语法不包含一些输入词:“'Michel','banana'”。

但是,我希望语法中没有涵盖这些词语,以便在程序的其他地方使用它们。

1 个答案:

答案 0 :(得分:1)

您可以使用<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <div class="test"> <h1>Hello World</h1> "This is text" <h2>blah blah blah</h2> <p>Random content</p> <ul> <li>list</li> <li>list</li> </ul> </div>,但它会使用缺失单词列表引发相同的异常。但是,查看grammar.check_coverage(sentence.split())方法的来源:

check_coverage

你可以根据他们的例子编写一个新函数,如:

def check_coverage(self, tokens):
    """
    Check whether the grammar rules cover the given list of tokens.
    If not, then raise an exception.

    :type tokens: list(str)
    """
    missing = [tok for tok in tokens
               if not self._lexical_index.get(tok)]
    if missing:
        missing = ', '.join('%r' % (w,) for w in missing)
        raise ValueError("Grammar does not cover some of the "
                         "input words: %r." % missing)

并使用类似def get_missing_words(grammar, tokens): """ Find list of missing tokens not covered by grammar """ missing = [tok for tok in tokens if not grammar._lexical_index.get(tok)] return missing 的方式在您的示例中获取get_missing_words(grammar, sentence.split())