Question

我正在使用字典在Python中存储一些字符对（我正在替换变音字符）。这是它的样子：

umlautdict={
    'ae': 'ä',
    'ue': 'ü',
    'oe': 'ö'
    }

然后我像这样运行我的输入词：

for item in umlautdict.keys():
        outputword=inputword.replace(item,umlautdict[item])

但这没有做任何事情（没有替代发生）。当我打印出我的umlautdict时，我看到它看起来像这样：

{'ue': '\xfc', 'oe': '\xf6', 'ae': '\xc3\xa4'}

当然这不是我想要的;但是，尝试像unicode()（ - ＆gt;错误）或预先修复u这样的事情并没有改善。

如果我手动在replace()命令中输入'ä'或'ö'，一切正常。我还将我的脚本（在TextWrangler中工作）中的设置更改为# -*- coding: utf-8 -*-，因为它甚至可以让我在没有它的情况下执行包含变音符号的脚本。

所以我没有......

为什么会这样？为什么以及什么时候变音符号变好“好当我将它们存储在字典中时，它会变成邪恶吗？
如何解决？
此外，如果有人知道：什么是学习的好资源用Python编码？我一直有问题和很多事情对我没有意义/我无法绕过头。

我正在使用Python 2.7.10开发Mac。谢谢你的帮助！

Answer 1

转换为Unicode是通过解码你的字符串来完成的（假设你正在获取字节）：

data = "haer ueber loess"
word = data.decode('utf-8')  # actual encoding depends on your data

使用unicode字符串定义你的dict：

umlautdict={
    u'ae': u'ä',
    u'ue': u'ü',
    u'oe': u'ö'
    }

最后print umlautdict将打印出该dict的一些表示，通常涉及转义。这是正常的，你不必担心这一点。

Answer 2

声明您的编码。
使用原始格式表示特殊字符。
在字符串上正确迭代：在前往下一个循环时保持每次循环迭代的更改。

以下是完成工作的代码：

\# -*- coding: utf-8 -*-

umlautdict = {
    'ae': r'ä',
    'ue': r'ü',
    'oe': r'ö'
    }

print umlautdict

inputword = "haer ueber loess"
for item in umlautdict.keys():
        inputword = inputword.replace(item, umlautdict[item])

print inputword

输出：

{'ue': '\xc3\xbc', 'oe': '\xc3\xb6', 'ae': '\xc3\xa4'}
här über löss

在Python中使用unicode / umlauts：Dictionary v manual input

2 个答案: