在哪里可以找到NLTK中ClassifierBasedPOSTagger的POS标记的所有标记定义?

时间:2016-09-22 06:39:55

标签: python nltk

我使用以下代码训练ClassifierBasedPOSTagger进行POS标记:

from nltk.classify import MaxentClassifier
from nltk.tag.sequential import ClassifierBasedPOSTagger

me_tagger = ClassifierBasedPOSTagger(train=train_sents, classifier_builder=lambda train_feats: MaxentClassifier.train(train_feats, max_iter=15))
print(me_tagger.tag('My new watch is awesome...'.split()))

打印出以下标签:

[('My', 'PP$'), ('new', 'JJ'), ('watch', 'NN'), ('is', 'BEZ'), ('awesome...', 'AT')]

在哪里可以找到此分类器的令牌标记定义?我虽然熟悉these令牌,但我无法解释BEZAT

2 个答案:

答案 0 :(得分:2)

您可以查看 - The Brown Corpus Tag-set

╔═════╦═════════════════════╦════════════════════╗
║ Tag ║ Description         ║ Examples           ║
╠═════╬═════════════════════╬════════════════════╣
║ AT  ║ article             ║ the an no a every  ║
║     ║                     ║ th' ever' ye       ║
╠═════╬═════════════════════╬════════════════════╣
║ BEZ ║ verb "to be",       ║ is                 ║
║     ║ present tense,      ║                    ║
║     ║ 3rd person singular ║                    ║
╠═════╬═════════════════════╬════════════════════╣
║ ... ║ ...                 ║ ...                ║
╚═════╩═════════════════════╩════════════════════╝

答案 1 :(得分:1)

您应该明白标签集与您选择的分类器类无关;标签集来自您的训练数据。所以你的问题应该是"我在哪里可以找到(这个带有POS标签的语料库)"的标签定义。你不会说你的train_sents来自哪里,但确实(正如@RAVI已经指出的那样)这些标签似乎来自布朗语料库;您可以阅读其标记集文档online,或从nltk中获取它,如下所示:

>>> nltk.help.brown_tagset("BEZ")
BEZ: verb 'to be', present tense, 3rd person singular
    is
>>> nltk.help.brown_tagset()   # All tags
...