如何从字典中替换unicode txt文件中的西里尔字

时间:2013-10-16 19:55:05

标签: python replace data-dictionary cyrillic

我试图使用字典替换Unicode txt文件中的西里尔字。我并不期望替换单词很困难,但在处理西里尔文本时,有一个16字节或8字节的附加元素是一个问题。我尝试了很多不同的代码,但似乎都没有。我真的很感激任何帮助!

我的词典位于一个名为'chars'的文件中,其内容如下:

cyrillic_ordinals = {
u'первый' : u'one',
u'второй' : u'two',
u'третий' : u'three',
u'четвёртый' : u'four'  }

我不确定为什么我的代码无效。对于上下文,代码的开头是替换定义(有错误),后半部分代码仅用于指定输入和输出文件。

import sys
import codecs
import os
import chars

def replaceordinals(text, cyrillic_ordinals):
    for i, j in cyrillic_ordinals.iteritems():
        text = text.replace(i, j)
        return text

def readAndWrite(input_file, output_file):
    try:
        w_f = codecs.open(output_file, encoding='utf-8', mode='w+')
    except IOError:
        print("Can't create or edit output file. Do you have rights to create file here?")
        print("For unix systems try to use \"sudo python\" instead of \"python\"")

    try:
        i_f = codecs.open(input_file, encoding='utf-8')
        for line in i_f:
            w_f.write(replaceordinals(line, chars.cyrillic_ordinals))
    except IOError:
       print("Can't read input file. Check your path to input file")
    except:
        try:
            i_f = codecs.open(input_file, encoding='utf-16')
            for line in i_f:
                w_f.write(replaceordinals(line, chars.cyrillic_ordinals))
        except IOError:
            print("Can't read input file. Check your path to input file")


def main(argv):
    #If user didn't provide path to input and/or output file - show an error, otherwise - try to run processing
    if len(argv) != 3:
        print("Missing file arguments.\nFormat: python " + argv[0] + " /home/user/Desktop/input_file.txt /home/user/Desktop/output_file.txt")
    else:
        readAndWrite(argv[1], argv[2])


if __name__ == "__main__":
    main(sys.argv)

创建的输出文件不会更改,西里尔文本不会被一,二等替换。有谁知道如何解决这个问题?

0 个答案:

没有答案