我正在寻找一个声誉良好的Java,开源(最好)库/包,它将文本作为输入,并识别并标记其中的词性。
组件如:
Verbs + Tense + Passive/Active {Simple Present, Past Progressive, Past Passive, Present Perfect ... }
Prepositions of movement {from, to...}
Prepositions of time and place {in, at, on...}
Adverbs of manner {fast, here, outside ... }
Comparatives {more, less ... }
Superlatives {most, least ... }
Adverbs of quantity {many, all... }
Conditionals
Relative pronouns
Relative adverbs
Modal Verbs
这个列表是我在网上找到的,但我确信有更好的标准标记可以做到。
答案 0 :(得分:1)
我认为您需要了解斯坦福大学这个颇具影响力的NLP图书馆。
答案 1 :(得分:0)
您可以使用Wall Street Tree Bank / Penn Tree Bank
(手动完全注释)作为设置POS标记培训数据的语料库。
LDC可以获得1500美元以上的相当大的费用:ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC99T42