从输入文件转换字符串

时间:2014-01-25 11:07:03

标签: python utf8-decode

我是python的新手,我需要一些手来处理这段代码:

此代码工作正常,它会根据需要转换字符串。

# -*- coding: utf-8 -*-
import sys
import arabic_reshaper
from bidi.algorithm import get_display

reshaped_text = arabic_reshaper.reshape(u' الحركات')
bidi_text = get_display(reshaped_text)
print >>open('out', 'w'), reshaped_text.encode('utf-8') # This is ok

当我尝试从文件中读取字符串时出现以下错误:

# -*- coding: utf-8 -*-
import sys
import arabic_reshaper
from bidi.algorithm import get_display

with open ("/home/nemo/Downloads/mpcabd-python-arabic-reshaper-552f3f4/data.txt" , "r") as myfile:
data=myfile.read().replace('\n', '')    
reshaped_text = arabic_reshaper.reshape(data)
bidi_text = get_display(reshaped_text)
print >>open('out', 'w'), reshaped_text.encode('utf-8')

UnicodeDecodeError:'ascii'编解码器无法解码位置0的字节0xd8:序号不在范围内(128)。

任何一只手

由于

2 个答案:

答案 0 :(得分:2)

  

方法decode()使用注册的编解码器对字符串进行解码   编码。它默认为默认字符串编码。

当您阅读 utf-8编码文件时,您需要使用string.decode('utf8')

写:

data = 'my data'
with open("file.txt" , "w") as f:
    f.write(data.encode('utf-8'))

读:

with open("file.txt" , "r") as f:
    data = f.read().decode('utf-8')

答案 1 :(得分:2)

您还可以使用内置open function的可选encoding参数:

with open("/home/nemo/Downloads/mpcabd-python-arabic-reshaper-552f3f4/data.txt",
          'rt',
          encoding='utf8') as f: