Question

我刚刚开始学习使用LPTHW的python，这真的很棒。我只是进行了几天的学习，然后开始练习16，看起来像这样：

-*- coding: utf-8 -*-

from sys import argv

script, filename = argv

print "We're going to erase %r." % filename
print "If you don't want that, hit CTRL-C (^C)."
print "If you do want that, hit RETURN."

raw_input("?")

print "Opening the file..."
target = open(filename, 'w')

print "Truncating the file.  Goodbye!"
target.truncate()

print "Now I'm going to ask you for three lines."

line1 = raw_input("line 1: ")
line2 = raw_input("line 2: ")
line3 = raw_input("line 3: ")

print "I'm going to write these to the file."

target.write("%r\n%r\n%r\n" % (line1, line2, line3))

print "And finally, we close it."
target.close()

问题是我来自一个字母表中带有“Å”，“Ä”和“Ö”字母的国家，但是当我使用这些字母时，文件中的输出（test.txt）看起来像什么像这样： u'hej” U '\ xc5je' u'l \ xe4get'

当我解码字符串时，可以执行以下操作： “汉拿” .decode（ “UTF-8”）

它会打印得很好

但我也希望用户输入正确，即使使用奇数字符也是如此。我尝试了不同的东西，或者不起作用，或者在运行时给我错误，例如

line1 = raw_input("line 1: ").decode("utf-8")

我试图谷歌我的问题，但我不觉得给出的答案不是很直接或为更有经验的用户写的。

如果有人花一些时间以初学者的方式解释unicode字符的编码/解码，并给我一个如何让它工作的例子，我会非常喜欢它

如果有帮助，iam在Windows 10上运行python 2.7.10并且我的系统语言环境设置为瑞典语

Answer 1

这是解码stdin的一种方法。它通常在Console中运行，但IDE有时会替换stdin对象，并且不会始终支持encoding参数。我还使用with和io.open处理编码，对代码进行了一些现代化。请注意，该文件将以UTF-8编写，因此请使用记事本将其打开以正确查看。从控制台使用type <filename>将尝试使用控制台的stdout编码显示该文件。

#!python2
import sys
import io

script, filename = sys.argv

print "We're going to erase %s." % filename
print "If you don't want that, hit CTRL-C (^C)."
print "If you do want that, hit RETURN."

raw_input("?")

print "Now I'm going to ask you for three lines."

line1 = raw_input("line 1: ").decode(sys.stdin.encoding)
line2 = raw_input("line 2: ").decode(sys.stdin.encoding)
line3 = raw_input("line 3: ").decode(sys.stdin.encoding)

print "I'm going to write these to the file."

with io.open(filename, 'wt', encoding='utf8') as target:
    target.write(u"%s\n%s\n%s\n" % (line1, line2, line3))

Answer 2

您的输出表明raw_input()已在您的环境中接受Å，ä。

您的代码与输出不对应，或者您的IDE太有用了。 raw_input()应返回str类型（字节），但输出显示您正在保存unicode个对象的文本表示形式：u'hej' u'\xc5je' u'l\xe4get'。

产生理想结果的最小代码更改是使用%s（保存字符串）而不是%r（保存其由repr()函数返回的ascii可打印表示） @chepner's answer中建议的格式字符串。

如果有人花一些时间以初学者的方式解释unicode字符的编码/解码，并给我一个如何让它工作的例子，我会非常喜欢它

Python 2上的Unicode处理需要了解API返回文本以及API返回二进制数据的内容。某些API使用混合，例如基于ascii的网络协议。

Python 2允许str类型表示人类可读的文本和二进制数据，这可能会造成混淆。我建议从Python 3开始，这对于Unicode相关问题更为严格。

通常，在使用Unicode时，您应尽快将编码文本转换为Unicode（例如，使用.decode()）并尽可能晚地将Unicode文本转换为输出字节。 @Mark Tolonen's answer demonstrate this approach：

它使用.decode(sys.stdin.encoding)将从raw_input()返回的字节解码为Unicode文本。如果raw_input()已在您的环境中返回Unicode（以检查print type(raw_input('input something'))），那么您可以省略.decode()调用
io.open(..., encoding='utf-8').write(u'some text')将Unicode文本转换为字节（使用utf-8编码对其进行编码）。

这种一般方法称为Unicode sandwich。

.decode(sys.stdin.encoding)可能会失败。要在Windows控制台中支持任意Unicode输入，install win-unicode-console Python package。

Answer 3

您正在为您的文件编写字符串的表示，而不是实际编码的Unicode字符串。使用

target.write("%s\n%s\n%s\n" % (line1, line2, line3))

代替。

Answer 4

您可以使用以下格式：

f = open('file.txt', 'w') s = u'\u221A' f.write(s.encode('utf-8'))

这里： line1 = raw_input("> ").encode('utf-8') 对于line2和line3也是如此

在Python中使用raw_input（）的Unicode输入

4 个答案: