将utf-8格式的字符串转换为unicode:Python

时间:2015-09-03 10:33:31

标签: python unicode

我有一个包含以下行的文本文件:

str = '0|Crazy Taxi\xe2\x84\xa2 City Rush^Truck Racing Super Gear^Candy Crush Soda Saga^Car Parking^BMX Kid^Hill Climb Racing^UNLimited Kareena Kapoor^3D Car Parking^Find My Android Phone!^Christmas Trains^Top Free Games^Telegram^Door Screen Lock^Adventure of Ted 2 - Free^Sonic Jump^'

我想删除“\ xe2 \ x84 \ xa2”,我可以使用以下代码行执行此操作:

print unicode(str,errors="ignore")

output = '0|Crazy Taxi City Rush^Truck Racing Super Gear^Candy Crush Soda Saga^Car Parking^BMX Kid^Hill Climb Racing^UNLimited Kareena Kapoor^3D Car Parking^Find My Android Phone!^Christmas Trains^Top Free Games^Telegram^Door Screen Lock^Adventure of Ted 2 - Free^Sonic Jump^'

但是当我使用下面提到的代码在完整文件上运行相同的逻辑时:

with open('train_data_dump.txt', mode='r') as document:
    for line in document:
        print unicode(line,errors='ignore')

正在打印以前的线条。

随意询问如果我在提问时不够清楚,请帮助。

1 个答案:

答案 0 :(得分:3)

从文件中分配变量时,就像分配了原始字符串一样 - 反斜杠被视为普通字母。您需要首先解码转义的字符。

unicode(i.decode("string_escape"), errors="ignore")

Python Specific Encodings