用于阿拉伯语的NLTK pos标签

时间:2018-03-02 17:24:09

标签: python nltk arabic pos-tagger

我编写了一个代码来查找我的python shell 2.7中的阿拉伯语单词的POS并且输出不正确,我在stackoverflow上找到了这个解决方案: Unknown symbol in nltk pos tagging for Arabic

我下载了所有需要的文件(stanford-postagger-full-2018-02-27)这个文件在上面的问题代码中使用。

这个代码来自上面的问题,我把它写在我的shell中:

    # -*- coding: cp1256 -*-

   from nltk.tag import pos_tag
   from nltk.tag.stanford import POSTagger
   from nltk.data import load
   from nltk.tokenize import word_tokenize
   _POS_TAGGER = 'taggers/maxent_treebank_pos_tagger/english.pickle'
   def pos_tag(tokens):
      tagger = load(_POS_TAGGER)
      return tagger.tag(tokens)
   path_to_model= 'D:\StanfordParser\stanford-postagger-full-2018-02-
   27\models/arabic.tagger'
   path_to_jar = 'D:\StanfordParser\stanford-postagger-full-2018-02-
   27/stanford-postagger-3.9.1.jar'

   artagger = POSTagger(path_to_model, path_to_jar, encoding='utf8')
   artagger._SEPARATOR = '/'
   tagged_sent = artagger.tag(u"أنا تسلق شجرة")
   print(tagged_sent)

,输出结果为:

    Traceback (most recent call last):
    File "C:/Python27/Lib/mo.py", line 4, in <module>
    from nltk.tag.stanford import POSTagger
    ImportError: cannot import name POSTagger

如何解决此错误?

1 个答案:

答案 0 :(得分:0)

此脚本可以在我的PC上正常运行,但标记结果看起来效果不佳!

import nltk
from nltk import *
from nltk.tag.stanford import StanfordTagger
import os

java_path = "Put your local path in here/Java/javapath/java.exe"
os.environ['JAVAHOME'] = java_path

path_to_model= ('Put your local path in here/stanford-postagger-full-2017-06-09/models/arabic.tagger')

path_to_jar = ('Put your local path in here/stanford-postagger-full-2017-06-09/stanford-postagger.jar')

artagger = StanfordPOSTagger(path_to_model, path_to_jar, encoding='utf8')

artagger._SEPARATOR = "/"

tagged_sent = artagger.tag("أنا أتسلق شجرة".split())

print(tagged_sent)

结果: [(''أنا','VBD'),('أتسلق','NN'),('شجرة','NN')]

尝试一下:-)