Question

我用Python创建了一个词典，但是我遇到了扩展的Ascii代码的问题。

创建词典的循环是:( ascii编号128到164：é，àetc等）

#extented ascii codes
i = 128
while i <= 165 :
    dictionnary[chr(i)] = 'extended ascii'
    i = i + 1

但是当我尝试使用词典时：

    >>> dictionnary['è']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: '\xc3\xa8'

我在python脚本的标题中有＃ - - 编码：utf-8 - 。我已经尝试过编码，解码等，但结果总是很糟糕。

为了理解会发生什么，我尝试过：

>>> ord('é')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found

和

    >>> ord(u'é')
233

我和ord（u'é'）感到困惑，因为'é'在扩展的ascii表中是130，而不是233。

据我所知，扩展的ascii代码包含“两个字符”，但我不明白如何用dictionnary解决问题？

提前致谢！： - ）

Answer 1

使用unichr代替chr。函数chr生成包含单个字节的字符串，而unichr生成包含单个unicode字符的字符串。最后，使用unicode字符进行查找：d[u'é']因为d['é']会查找é的utf-8编码。

您的代码中有3件事：latin-1编码str，utf-8编码str和unicode字符串。在任何时候都清楚地了解你已经掌握了很多关于Python如何工作以及对Unicode和编码的理解的知识。

没有关于Joel Spolsky关于此事的文章的链接，没有关于编码和Unicode的答案是完整的：The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

在Python中使用扩展的Ascii代码

1 个答案: