Question

我阅读了很多链接和建议，毕竟当我需要在Python中操作不是ASCII字符的字符串时，我更加困惑。

我在Ubuntu上使用Python 2.7：

#!/usr/bin/python
# -*- coding: utf-8 -*-

for i, j in enumerate('Сон'): print '%d: %s' % (i+1, j)

输出：

1: Ð
2: ¡
3: Ð
4: ¾
5: Ð
6: ½

枚举3个UTF-8编码字符的最简单方法是什么，而不是6字节字符？

Answer 1

答案简单：don't。

>>> len(u'Сон')
3

Answer 2

如果你想输出utf-8字符，你还需要确保Python知道要使用哪种编码

$ export PYTHONIOENCODING=ascii
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'ascii'
>>> for i, j in enumerate(u'Сон'): print '%d: %s' % (i+1, j)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0421' in position 3: ordinal not in range(128)

$ export PYTHONIOENCODING=utf-8
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'utf-8'
>>> for i, j in enumerate(u'Сон'): print '%d: %s' % (i+1, j)
... 
1: С
2: о
3: н
>>>

Answer 3

# -*- coding: utf-8 -*-
for i, j in enumerate(u'Сон'):
    print '%d: %s' % (i+1, j)

关于Python中的源代码编码：http://www.python.org/dev/peps/pep-0263/

字符串前面的'u'前缀意味着将使用unicode字符串。

Answer 4

在它前面添加一个'u'来指定它是unicode：

for i, j in enumerate(u'Сон'): print '%d: %s' % (i+1, j)

输出

1: С
2: о
3: н

最简单的枚举utf-8字符串的方法

4 个答案: