how to tag part of a sentence (more than one word) with nltk taggers

时间:2015-11-12 12:04:28

标签: python nltk pos-tagger

I am playing around with the nltk taggers and built a tagger hierarchy based on two german sentences:

traindata = [
    [('ich bin dabei', 'PV')],
    [('ich', 'test'), ('bin', 'None'), ('nicht', 'NV'), ('dabei', 'None')],
]

t0 = nltk.DefaultTagger('None')
t1 = nltk.UnigramTagger(traindata, backoff=t0)
t2 = nltk.BigramTagger(traindata, backoff=t1)
t3 = nltk.TrigramTagger(traindata, backoff=t2)

input: ich bin dabei

expected: [('ich bin dabei', 'PV')]

result: [('ich', 'test'), ('bin', 'None'), ('dabei', 'None')]

The word ich was tagged based on the second element in traindata, although I expected all three words ich bin dabei to be tagged as PV.

How can I tag more than one word with nltk taggers? Also same with part of a sentence:

ich bin dabei, hoffe ich

should result in

[('ich bin dabei', 'PV'), ('hoffe', 'None'), ('ich', 'None')]

0 个答案:

没有答案