好吧,现在我有一个代码用nltk进行自定义标记。我使用NLTK的POS标记器作为带有trigram标记器的退避,我使用自定义标记训练自己的标记句子。这很有效但我希望能够使用spacy的POS标记器做同样的事情。有没有办法做到这一点?
这是我的代码:
Process: com.toyanathpatro.gurkha, PID: 7600
io.realm.exceptions.RealmError: Unrecoverable error. Wrong transactional state (no active transaction, wrong type of transaction, or transaction already in progress) in io_realm_internal_SharedGroup.cpp line 157
at io.realm.internal.SharedGroup.nativeAdvanceRead(Native Method)
at io.realm.internal.SharedGroup.advanceRead(SharedGroup.java:83)
at io.realm.internal.ImplicitTransaction.advanceRead(ImplicitTransaction.java:35)
at io.realm.internal.SharedGroupManager.advanceRead(SharedGroupManager.java:76)
at io.realm.HandlerController.realmChanged(HandlerController.java:384)
at io.realm.HandlerController.handleMessage(HandlerController.java:116)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:148)
at android.app.ActivityThread.main(ActivityThread.java:5417)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
答案 0 :(得分:1)
from spacy.en import English
oNlp = English()
oDoc = oNlp(sUnicodeInputText)
loTokens = [o for o in oDoc]
loTokens
此处包含spacy提取的所有令牌的列表。每个令牌都具有您可以使用的属性。要获取POS,请使用.pos_
属性。例如,要在元组中查看与其关联的所有词形化的令牌名称和POS标记:
print([ (o.lemma_, o.pos_) for o in loTokens ])
spacy documentation很精彩。看看吧。