Question

可能重复：
Converting a latin string to unicode in python

在存储到文件

后，我有一个包含以下格式的列表

list_example = [
         u"\u00cdndia, Tail\u00e2ndia &amp; Cingapura",
         u"Lines through the days 1 (Arabic) \u0633\u0637\u0648\u0631 \u0639\u0628\u0631 \u0627\u0644\u0623\u064a\u0627\u0645 1",
]

但是列表中字符串的实际格式是

actual_format = [
         "Índia, Tailândia & Cingapura ",
         "Lines through the days 1 (Arabic) سطور عبر الأيام 1 | شمس الدين خ "
]

如何将list_example中的字符串转换为actual_format列表中的字符串？

Answer 1

我的问题有点不清楚。在任何情况下，以下指南都可以帮助您解决问题。

如果您在Python源代码中定义这些字符串，那么您应该

知道编辑器保存源代码文件的字符编码（例如utf-8）
通过以下方式在源文件的第一行声明编码： # -*- coding: utf-8 -*-
将这些字符串定义为unicode对象：

strings = [u"Índia, Tailândia & Cingapura ", u"Lines through the days 1 (Arabic) سطور عبر الأيام 1 | شمس الدين خ "]

（注意：在Python 3中，文字字符串默认是unicode对象，即你不需要u。在Python 2中，unicode字符串的类型为unicode，在Python 3中， unicode字符串的类型为string。）

如果您希望将这些字符串保存到文件中，则应明确定义字符编码：

with open('filename', 'w') as f:
    s = '\n'.join(strings)
    f.write(s.encode('utf-8'))

当您想再次从该文件中读取这些字符串时，您必须再次明确定义字符编码才能正确解码文件内容：

with open('filename') as f:
    strings = [l.decode('utf-8') for line in f]

Answer 2

actual_format = [x.decode('unicode-escape') for x in list_example]

将unicode字符串转换为其原始格式

2 个答案: