我正在尝试使用POS tagging
arabic
NLTK
Python 3.6
import nltk
text = """ و نشر العدل من خلال قضاء مستقل ."""
sentence = nltk.tokenize.sent_tokenize(text)
# tokens = [nltk.tokenize.word_tokenize(s) for s in sentence]
tokens = [nltk.tokenize.wordpunct_tokenize(s) for s in sentence]
# Here pos tagging isn't right :'(
PosTokens = [nltk.pos_tag(e) for e in tokens]
chunks = nltk.ne_chunk_sents(PosTokens)
for tree in chunks:
print(tree)
文字,我找到了这个程序:
(S
و/JJ
(ORGANIZATION نشر/NNP)
العدل/NNP
من/NNP
خلال/NNP
قضاء/NNP
مستقل/NNP
./.)
结果:
(S (CC و)
(VP (VBD نشر)
(NP (DTNN العدل))
(PP (IN من)
(NP (NN خلال)
(NP (NN قضاء) (JJ مستقل)))))
(PUNC .))
这是一个糟糕的结果,例如: 虽然'نشر'是动词,但它给了我一个名词'NNP'。
正确的结果是:
result1 = Regex.IsMatch(password, rule1regex)
result2 = Regex.IsMatch(password, rule2regex)
...
resultN = Regex.IsMatch(password, rule3regex)
if(three_out_of_four_rules_apply)
password_valid = true