我正在尝试将.txt文件中的行提交到google translate api,然后将这些结果输出到单独的.txt文件中。一切正常,除了当我读取输出文件时,它是unicode,所以我最终得到像/ xeda这样的字符。我想在写入文件之前将结果转换为utf-8,但我的尝试似乎没有任何效果。我没有错误,但我仍然得到垃圾字符。这就是我的(相关)代码:
read_array = []
write_array = []
write_file = 'write_file.txt'
read_file = open('metaphors1.txt','r')
s = codecs.open('write_file.txt', 'w', 'utf-8')
for line in read_file:
#Reads sentences from the input file, converts them to a string with
#all lowercase letters (to prevent garbage values then puts the strings
#in an array
readstring = str(line)
readstring = readstring.lower()
read_array.append(readstring)
for item in read_array:
#removes new line symbols to prevent translation errors then submits
#sentences in the array to the translator, then writes the sentences
#to a new array
readitem = str(item)
readitem.rstrip('\n')
results1 = translator.translate(readitem)
resultstring = str(results1)
write_array.append(resultstring)
for item in write_array:
#writes the results to an output file
writeitem = str(item)
writeitem = writeitem.encode('utf-8')
s.write("%s\n" % writeitem)
s.close()
我确信无论我做错什么都是简单而明显的,但我对此感到难过。任何帮助,将不胜感激。谢谢!
答案 0 :(得分:0)
结帐http://docs.python.org/2/library/stdtypes.html#str.decode,如果您不关心错误,甚至可以告诉它忽略错误。
line.decode('utf-8','ignore')