似乎每次我认为我掌握了编码,我都会发现一些令我困惑的新东西: - )
我正试图从UTF-8字符串中删除法语口音:
>>> import unicodedata
>>> s = u"éèêàùçÇ"
>>> print(unicodedata.normalize('NFKD', s).encode('ascii','ignore'))
我希望eeeaucC
作为输出,而在Ubuntu 9.10和iPython 0.10中使用Python 2.6.4代替AA AaA A1AA
,所有内容都设置为unicode。
答案 0 :(得分:1)
在进一步测试后,如果您使用Python 3或Python 2.6解释器而不是iPython,它会起作用。
可能是错误的用户设置或错误。
答案 1 :(得分:0)
python
可以正常工作:
$ python
Python 2.6.4 (r264:75706, Dec 7 2009, 18:43:55)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = u"éèêàùçÇ"
>>> s
u'\xe9\xe8\xea\xe0\xf9\xe7\xc7'
>>> ord(s[0])
233
ipython
中有一些错误:
$ ipython
Python 2.6.4 (r264:75706, Dec 7 2009, 18:43:55)
Type "copyright", "credits" or "license" for more information.
IPython 0.10 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.
In [1]: s = u"éèêàùçÇ"
In [2]: ord(s[0])
Out[2]: 195
In [3]: s
Out[3]: u'\xc3\xa9\xc3\xa8\xc3\xaa\xc3\xa0\xc3\xb9\xc3\xa7\xc3\x87'
如果您从文件中读取它,则ipython
有效:
$ ipython
...
In [1]: import codecs
In [2]: s = codecs.open('s.txt', 'r', 'utf-8').read()
In [3]: s
Out[3]: u'\xe9\xe8\xea\xe0\xf9\xe7\xc7'
In [4]: ord(s[0])
Out[4]: 233