I am playing around with the nltk taggers and built a tagger hierarchy based on two german sentences:
traindata = [
[('ich bin dabei', 'PV')],
[('ich', 'test'), ('bin', 'None'), ('nicht', 'NV'), ('dabei', 'None')],
]
t0 = nltk.DefaultTagger('None')
t1 = nltk.UnigramTagger(traindata, backoff=t0)
t2 = nltk.BigramTagger(traindata, backoff=t1)
t3 = nltk.TrigramTagger(traindata, backoff=t2)
input: ich bin dabei
expected: [('ich bin dabei', 'PV')]
result: [('ich', 'test'), ('bin', 'None'), ('dabei', 'None')]
The word ich
was tagged based on the second element in traindata, although I expected all three words ich bin dabei
to be tagged as PV
.
How can I tag more than one word with nltk taggers? Also same with part of a sentence:
ich bin dabei, hoffe ich
should result in
[('ich bin dabei', 'PV'), ('hoffe', 'None'), ('ich', 'None')]