使用spacy自定义POS标记

时间:2016-06-01 10:46:38

标签: python nltk spacy

好吧,现在我有一个代码用nltk进行自定义标记。我使用NLTK的POS标记器作为带有trigram标记器的退避,我使用自定义标记训练自己的标记句子。这很有效但我希望能够使用spacy的POS标记器做同样的事情。有没有办法做到这一点?

这是我的代码:

Process: com.toyanathpatro.gurkha, PID: 7600
    io.realm.exceptions.RealmError: Unrecoverable error. Wrong transactional state (no active transaction, wrong type of transaction, or transaction already in progress) in io_realm_internal_SharedGroup.cpp line 157
        at io.realm.internal.SharedGroup.nativeAdvanceRead(Native Method)
        at io.realm.internal.SharedGroup.advanceRead(SharedGroup.java:83)
        at io.realm.internal.ImplicitTransaction.advanceRead(ImplicitTransaction.java:35)
        at io.realm.internal.SharedGroupManager.advanceRead(SharedGroupManager.java:76)
        at io.realm.HandlerController.realmChanged(HandlerController.java:384)
        at io.realm.HandlerController.handleMessage(HandlerController.java:116)
        at android.os.Handler.dispatchMessage(Handler.java:98)
        at android.os.Looper.loop(Looper.java:148)
        at android.app.ActivityThread.main(ActivityThread.java:5417)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)

1 个答案:

答案 0 :(得分:1)

from spacy.en import English
oNlp = English()

oDoc = oNlp(sUnicodeInputText)

loTokens = [o for o in oDoc]

loTokens此处包含spacy提取的所有令牌的列表。每个令牌都具有您可以使用的属性。要获取POS,请使用.pos_属性。例如,要在元组中查看与其关联的所有词形化的令牌名称和POS标记:

print([ (o.lemma_, o.pos_) for o in loTokens ])

spacy documentation很精彩。看看吧。