如何定义nltk.grammar.is_terminal()
使用的语法?无论我使用什么对象构建此方法,我总是得到A true
作为回报。但我想检查名为wordlist
的列表是否包含在grammar.cfg
下安全的无上下文语法中定义的作品。
答案 0 :(得分:1)
查看https://github.com/nltk/nltk/blob/develop/nltk/grammar.py
上的代码def is_nonterminal(item):
"""
:return: True if the item is a ``Nonterminal``.
:rtype: bool
"""
return isinstance(item, Nonterminal)
def is_terminal(item):
"""
Return True if the item is a terminal, which currently is
if it is hashable and not a ``Nonterminal``.
:rtype: bool
"""
return hasattr(item, '__hash__') and not isinstance(item, Nonterminal)
虽然我不确定应该如何使用这些函数,但对于任何字符串输入,is_terminal()
的默认值始终为True
。
因为,首先,所有字符串都包含__hash__
属性,它是哈希字符串的函数,请参阅https://docs.python.org/2/reference/datamodel.html#object.hash
>>> astring = 'foo bar'
>>> astring.__hash__
<method-wrapper '__hash__' of str object at 0x7f06bb0cbcc0>
>>> astring.__hash__()
8194924035431162904
其次,所有字符串肯定不是Nonterminal
中的NLTK
对象,因为类Nonterminal
是:
class Nonterminal(object):
"""
A non-terminal symbol for a context free grammar. ``Nonterminal``
is a wrapper class for node values; it is used by ``Production``
objects to distinguish node values from leaf values.
The node value that is wrapped by a ``Nonterminal`` is known as its
"symbol". Symbols are typically strings representing phrasal
categories (such as ``"NP"`` or ``"VP"``). However, more complex
symbol types are sometimes used (e.g., for lexicalized grammars).
Since symbols are node values, they must be immutable and
hashable. Two ``Nonterminals`` are considered equal if their
symbols are equal.
:see: ``CFG``, ``Production``
:type _symbol: any
:ivar _symbol: The node value corresponding to this
``Nonterminal``. This value must be immutable and hashable.
"""
因此,字符串经过(1)具有__hash__
属性和(2)不是Nonterminal
对象的两个标准。因此nltk.grammar.is_terminal()
总是为所有字符串返回True。
然后,只有当您加载语法然后读取语法中的非终结对象时,我才能使它返回False,可能只有当一个对象被专门创建或者作为非终结对象时,例如, http://www.nltk.org/_modules/nltk/parse/pchart.html