不含NLTK的python中的POS标记器

时间:2018-12-11 22:48:57

标签: python nlp pos-tagger

我正在尝试为Sorani Kurdish的确定者和介词制作POS标记器。我正在使用以下代码将每个标记放在我的库尔德文本中每个命题或确定词之后。

import os
SOR = open("SOR-1.txt", "r+", encoding = 'utf-8')
old_text = SOR.read()
punkt = [".", "!", ",", ":", ";"]
text = ""
for i in old_text:
    if i in punkt:
        text+=" "+i
    else:
        text += i

d = {"DET":["ئێمە" , "ئێوە" , "ئەم" , "ئەو" , "ئەوان" , "ئەوەی", "چەند" ], "PREP":["بۆ","بێ","بێجگە","بە","بەبێ","بەدەم","بەردەم","بەرلە","بەرەوی","بەرەوە","بەلای","بەپێی","تۆ","تێ","جگە","دوای","دەگەڵ","سەر","لێ","لە","لەبابەت","لەباتی","لەبارەی","لەبرێتی","لەبن","لەبەینی","لەبەر","لەدەم","لەرێ","لەرێگا","لەرەوی","لەسەر","لەلایەن","لەناو","لەنێو","لەو","لەپێناوی","لەژێر","لەگەڵ","ناو","نێوان","وەک","وەک","پاش","پێش","" ], "punkt":[".", ",", "!"]}

text = text.split()
for w in text:
    for pos in d:
        if w in d[pos]:
            SOR.write(w+"/"+pos+" ")
SOR.close()

我想做的是在定义的字典中每个单词之后的文本内添加POS标签,但是结果是在文件末尾有单独的单词和POS标签列表。

1 个答案:

答案 0 :(得分:0)

请记住,.m-intro { padding: 10%; } 是单个字符串。因此,当您像

那样遍历它时
old_text

您正在遍历字符。我认为您打算改为遍历for i in old_text: if i in punkt: 行。在这种情况下,您可以使用带有old_textread模式的with语句打开文件。像这样:

write