从python3中的文件读取字节字符串

时间:2017-04-11 05:39:02

标签: python string python-3.x byte

文件内容如下,文件编码为utf-8:

cd232704-a46f-3d9d-97f6-67edb897d65f    b'this Friday, Gerda Scheuers will be excited \xe2\x80\x94 but she\xe2\x80\x99s most excited about the merchandise the movie will bring.'

这是我的代码:

with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('\t')
        print(tokens[1])

我想得到正确答案 - "本周五,Gerda Scheuers会很兴奋 - 但她对电影带来的商品最为兴奋。"

print(b'\xe2\x80\x94'.decode('utf-8')) #convert into ASCII 

但我无法从文件中读取字节。如果我打开一个包含字节的文件,我需要对该行进行解码以进行分割。

1 个答案:

答案 0 :(得分:2)

您可以使用ast.literal_eval将字节文字转换为字节:

然后,解码它以获取字符串对象:

>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'")
b'excited \xe2\x80\x94 but she\xe2\x80\x99s'
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'").decode('utf-8')
'excited — but she’s'
with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('\t')
        # if len(tokens) < 2:
        #    continue
        bytes_part = ast.literal_eval(tokens[1])
        s = bytes_part.decode('utf-8')  # Decode the bytes to convert to a string