如何让程序返回语法未涵盖的单词列表?例如,请考虑以下代码:
import nltk
# Define the cfg grammar.
grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> V NP
NP -> det N | N
V -> "eats" | "drinks"
N -> "President" | "apple"
det -> "The" | "a" | "an"
""")
sentence = "The President Michel eats banana"
# Load the grammar into the ChartParser.
cp = nltk.ChartParser(grammar)
# Generate and print the parse from the grammar given the sentence tokens.
for tree in cp.parse(sentence.split()):
print(tree)
它只显示错误消息: ValueError:语法不包含一些输入词:“'Michel','banana'”。
但是,我希望语法中没有涵盖这些词语,以便在程序的其他地方使用它们。
答案 0 :(得分:1)
您可以使用<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="test">
<h1>Hello World</h1>
"This is text"
<h2>blah blah blah</h2>
<p>Random content</p>
<ul>
<li>list</li>
<li>list</li>
</ul>
</div>
,但它会使用缺失单词列表引发相同的异常。但是,查看grammar.check_coverage(sentence.split())
方法的来源:
check_coverage
你可以根据他们的例子编写一个新函数,如:
def check_coverage(self, tokens):
"""
Check whether the grammar rules cover the given list of tokens.
If not, then raise an exception.
:type tokens: list(str)
"""
missing = [tok for tok in tokens
if not self._lexical_index.get(tok)]
if missing:
missing = ', '.join('%r' % (w,) for w in missing)
raise ValueError("Grammar does not cover some of the "
"input words: %r." % missing)
并使用类似def get_missing_words(grammar, tokens):
"""
Find list of missing tokens not covered by grammar
"""
missing = [tok for tok in tokens
if not grammar._lexical_index.get(tok)]
return missing
的方式在您的示例中获取get_missing_words(grammar, sentence.split())
。