dict删除智能引号

时间:2016-10-30 16:06:06

标签: python-2.7 text unicode

charmap = [
  (u"\u201c\u201d", "\""),
  (u"\u2018\u2019", "'")
  ]

_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)
print fixed

我正在寻找一个类似的脚本来替换这里回答的文本中的智能引号和撇号:here:有人会善意地解释这两行:

_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)

并且可能用较长的格式重写它们,并用注释来解释究竟发生了什么 - 我对它的内部/外部循环组合或顺序检查字典中的项目感到困惑。

2 个答案:

答案 0 :(得分:3)

_map = dict((c, r) for chars, r in charmap for c in list(chars))

表示:

_map = {}                     # an empty dictionary
for (c, r) in charmap:        # c - string of symbols to be replaced, r - replacement
    for chars in list(c):     # chars - individual symbol from c
        _map[chars] = r       # adding entry replaced:replacement to the dictionary

fixed = "".join(_map.get(c, c) for c in s)

装置

fixed = ""                          # an empty string   
for c in s:
    fixed = fixed + _map.get(c, c)  # first "c" is key, second is default for "not found"

作为方法.join简单地将序列元素与给定字符串连接为它们之间的分隔符(在本例中为"",即没有分隔符)

答案 1 :(得分:1)

使用内置字符串函数translate

更快,更直接
#!python2
#coding: utf8

# Start with a Unicode string.
# Your codecs.open() will read the text in Unicode
text = u'''\
"Don't be dumb"
“You’re smart!”
'''

# Build a translation dictionary.
# Keys are Unicode ordinal numbers.
# Values can be ordinals, Unicode strings, or None (to delete)
charmap = { 0x201c : u'"',
            0x201d : u'"',
            0x2018 : u"'",
            0x2019 : u"'" }

print text.translate(charmap)

输出:

"Don't be dumb"
"You're smart!"