我试图使用字典替换Unicode txt文件中的西里尔字。我并不期望替换单词很困难,但在处理西里尔文本时,有一个16字节或8字节的附加元素是一个问题。我尝试了很多不同的代码,但似乎都没有。我真的很感激任何帮助!
我的词典位于一个名为'chars'的文件中,其内容如下:
cyrillic_ordinals = {
u'первый' : u'one',
u'второй' : u'two',
u'третий' : u'three',
u'четвёртый' : u'four' }
我不确定为什么我的代码无效。对于上下文,代码的开头是替换定义(有错误),后半部分代码仅用于指定输入和输出文件。
import sys
import codecs
import os
import chars
def replaceordinals(text, cyrillic_ordinals):
for i, j in cyrillic_ordinals.iteritems():
text = text.replace(i, j)
return text
def readAndWrite(input_file, output_file):
try:
w_f = codecs.open(output_file, encoding='utf-8', mode='w+')
except IOError:
print("Can't create or edit output file. Do you have rights to create file here?")
print("For unix systems try to use \"sudo python\" instead of \"python\"")
try:
i_f = codecs.open(input_file, encoding='utf-8')
for line in i_f:
w_f.write(replaceordinals(line, chars.cyrillic_ordinals))
except IOError:
print("Can't read input file. Check your path to input file")
except:
try:
i_f = codecs.open(input_file, encoding='utf-16')
for line in i_f:
w_f.write(replaceordinals(line, chars.cyrillic_ordinals))
except IOError:
print("Can't read input file. Check your path to input file")
def main(argv):
#If user didn't provide path to input and/or output file - show an error, otherwise - try to run processing
if len(argv) != 3:
print("Missing file arguments.\nFormat: python " + argv[0] + " /home/user/Desktop/input_file.txt /home/user/Desktop/output_file.txt")
else:
readAndWrite(argv[1], argv[2])
if __name__ == "__main__":
main(sys.argv)
创建的输出文件不会更改,西里尔文本不会被一,二等替换。有谁知道如何解决这个问题?