在Python中写入.txt文件时,尝试将输出字符串转换为UTF-8

时间:2014-02-17 18:44:52

标签: python string unicode utf-8 output

我正在尝试将.txt文件中的行提交到google translate api,然后将这些结果输出到单独的.txt文件中。一切正常,除了当我读取输出文件时,它是unicode,所以我最终得到像/ xeda这样的字符。我想在写入文件之前将结果转换为utf-8,但我的尝试似乎没有任何效果。我没有错误,但我仍然得到垃圾字符。这就是我的(相关)代码:

read_array = []
write_array = []
write_file = 'write_file.txt'
read_file = open('metaphors1.txt','r')
s = codecs.open('write_file.txt', 'w', 'utf-8')

for line in read_file:
    #Reads sentences from the input file, converts them to a string with
    #all lowercase letters (to prevent garbage values then puts the strings
    #in an array
    readstring = str(line)
    readstring = readstring.lower()
    read_array.append(readstring)

for item in read_array:
    #removes new line symbols to prevent translation errors then submits
    #sentences in the array to the translator, then writes the sentences
    #to a new array
    readitem = str(item)
    readitem.rstrip('\n')
    results1 = translator.translate(readitem)
    resultstring = str(results1)
    write_array.append(resultstring)

for item in write_array:
    #writes the results to an output file
    writeitem = str(item)
    writeitem = writeitem.encode('utf-8')
    s.write("%s\n" % writeitem)

s.close()

我确信无论我做错什么都是简单而明显的,但我对此感到难过。任何帮助,将不胜感激。谢谢!

1 个答案:

答案 0 :(得分:0)

结帐http://docs.python.org/2/library/stdtypes.html#str.decode,如果您不关心错误,甚至可以告诉它忽略错误。

line.decode('utf-8','ignore')