POS在Python中使用nltk标记阿拉伯语文本

时间:2017-07-04 13:33:53

标签: python python-3.x nltk arabic pos-tagger

我正在尝试使用POS tagging arabic NLTK Python 3.6 import nltk text = """ و نشر العدل من خلال قضاء مستقل .""" sentence = nltk.tokenize.sent_tokenize(text) # tokens = [nltk.tokenize.word_tokenize(s) for s in sentence] tokens = [nltk.tokenize.wordpunct_tokenize(s) for s in sentence] # Here pos tagging isn't right :'( PosTokens = [nltk.pos_tag(e) for e in tokens] chunks = nltk.ne_chunk_sents(PosTokens) for tree in chunks: print(tree) 文字,我找到了这个程序:

(S
  و/JJ
  (ORGANIZATION نشر/NNP)
  العدل/NNP
  من/NNP
  خلال/NNP
  قضاء/NNP
  مستقل/NNP
  ./.)

结果:

  (S (CC و)
    (VP (VBD نشر)
      (NP (DTNN العدل))
      (PP (IN من)
        (NP (NN خلال)
          (NP (NN قضاء) (JJ مستقل)))))
    (PUNC .))

这是一个糟糕的结果,例如: 虽然'نشر'是动词,但它给了我一个名词'NNP'。

正确的结果是:

result1 = Regex.IsMatch(password, rule1regex)
result2 = Regex.IsMatch(password, rule2regex)
...
resultN = Regex.IsMatch(password, rule3regex)

if(three_out_of_four_rules_apply)
    password_valid = true

0 个答案:

没有答案