Question

我有以下代码非常奇怪。

当我执行以下操作时，utf-8很好地转换为unicode。

print u'\xE1\x80\x96\xE1\x80\xBB\xE1\x80\xB1\xE1\x80\xAC\xE1\x80\xBA\xE1\x80\x9B\xE1\x80\x8A\xE1\x80\xBA'.encode('raw_unicode_escape')

这很好用。但是，当我从sys.argv获取utf-8字符串时，它不起作用。

import sys    

if __name__ == "__main__":
    args = sys.argv

    input_string = args[1]

    if type(input_string) is not unicode:
        input_string = unicode(input_string, "utf-8")

    print type(input_string)
    print input_string

当我像下面这样跑时，

python test_print.py "\xE1\x80\x96\xE1\x80\xBB\xE1\x80\xB1\xE1\x80\xAC\xE1\x80\xBA\xE1\x80\x9B\xE1\x80\x8A\xE1\x80\xBA"

我得到了以下相同的字符串，它没有被转换为unicode。

<type 'unicode'>
\xE1\x80\x96\xE1\x80\xBB\xE1\x80\xB1\xE1\x80\xAC\xE1\x80\xBA\xE1\x80\x9B\xE1\x80\x8A\xE1\x80\xBA

我需要将sys.argv的输入转换为unicode字符。

请帮忙。

感谢。

Answer 1

实际的Python级别字符串文字（对于str和unicode）是Python自动解析字符转义的唯一地方。如果你想转换使用像这样的文字转义的外部字符串，你可以这样做explicitly invoke the literal escape interpretation machinery：

# Converts from str to str interpreting escapes, then decodes those bytes
# using the UTF-8 encoding
input_string = args[1].decode('string_escape').decode('utf-8')

Python 3中的确切步骤略有不同（您必须使用unicode_escape和codecs模块，并添加额外的步骤以将文字解码str转换为{{1 }} latin-1在解码为bytes之前，因为不支持text-＆gt;文本编码和解码），但这是一个类似的过程。

Python sys.argv utf-8使unicode无法正常工作

1 个答案: