在nltk中扩展一个类。 - 蟒蛇

时间:2014-11-29 14:38:16

标签: python class inheritance nltk wordnet

目的是在nltk中为wordnet类添加其他功能,例如:

from nltk.corpus import wordnet

class WN(wordnet):
    def foobar(self):
        print 'foobar'

x = WN
WN.foobar()

但它会出错:

Traceback (most recent call last):
  File "/home/alvas/workspace/pybabel-fresh/babelnet/utils/pybabel_WordNet.py", line 5, in <module>
    class WN(wordnet):
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/util.py", line 44, in __init__
    assert issubclass(reader_cls, CorpusReader)
TypeError: Error when calling the metaclass bases
    issubclass() arg 1 must be a class

所以我尝试了nltk.corpus.reader.WordNetCorpusReaderhttp://www.nltk.org/_modules/nltk/corpus/reader/wordnet.html#WordNetCorpusReader):

from nltk.corpus.reader import WordNetCorpusReader

class WN(WordNetCorpusReader):
    def __init__(self):
        self = WN.__init__()

    def foobar(self):
        return "foobar"

x = WN
x.foobar()

似乎我正在使用WordNetCorpusReader,我需要实例化它,所以我得到了:

Traceback (most recent call last):
  File "/home/alvas/workspace/pybabel-fresh/babelnet/utils/pybabel_WordNet.py", line 13, in <module>
    x.foobar()
TypeError: unbound method foobar() must be called with WN instance as first argument (got nothing instead)

然后我尝试了:

from nltk.corpus.reader import WordNetCorpusReader

class WN(WordNetCorpusReader):
    def foobar(self):
        return "foobar"

x = WN
for i in x.all_synsets():
    print i

[OUT]:

Traceback (most recent call last):
  File "/home/alvas/workspace/pybabel-fresh/babelnet/utils/pybabel_WordNet.py", line 10, in <module>
    for i in x.all_synsets():
TypeError: unbound method all_synsets() must be called with WN instance as first argument (got nothing instead)

如何使用新函数扩展nltk wordnet API?注意:目的是使用新函数创建一个新类。

1 个答案:

答案 0 :(得分:4)

你的第二次尝试似乎最接近。你的构造函数存在问题:

class WN(WordNetCorpusReader):
    def __init__(self):
        self = WN.__init__()  # needs an instance as the first argument, recursive, and no need to assign to self

__init__方法需要一个实例作为其第一个参数(此处为self),此外,您还需要调用错误类的__init__方法。这将导致RuntimeError: maximum recursion depth exceeded错误。最后,您只想调用该方法;您无需将方法的结果分配给self

我认为你的意思是这样做:

from nltk.corpus.reader import WordNetCorpusReader
import nltk

class WN(WordNetCorpusReader):
    def __init__(self, *args):
        WordNetCorpusReader.__init__(self, *args)

    def foobar(self):
        return "foobar"

但问题是,您需要将所需的WordNetCorpusReader.__init__ args传递给新课程。在我的nltk版本中,这意味着您需要传递root参数,如下所示:

>>> x = WN(nltk.data.find('corpora/wordnet'))
>>> x.foobar()
'foobar'
>>> x.synsets('run')
[Synset('run.n.01'), Synset('test.n.05'), ...]

更有效的方法

执行相同操作的更有效方法如下:

class WN(WordNetCorpusReader):
    root = nltk.data.find('corpora/wordnet')  # make root a class variable, so you only need to load it once
    def __init__(self, *args, **kwargs):
        WordNetCorpusReader.__init__(self, WN.root, *args, **kwargs)  # add root yourself here, so no arguments are required

    def foobar(self):
        return "foobar"

现在测试一下:

>>> x = WN()
>>> x.foobar()
'foobar'
>>> x.synsets('run')
[Synset('run.n.01'), Synset('test.n.05'), ...]

顺便说一句,我很高兴看到你在nltk标签上的工作。