Question

Python版本：Python 3.6。我试图用常规撇号替换Unicode字符u“\ u0092”（又名卷曲撇号）。

我尝试了以下所有内容：

    mystring = <some string with problem character>
    # option 1 
    mystring = mystring.replace(u"\u0092", u\"0027")
    # option 2 
    mystring = mystring.replace(u"\u0092", "'")
    # option 3
    mystring = re.sub('\u0092',u"\u0027", mystring)
    # option 4
    mystring = re.sub('\u0092',u"'", mystring)

以上都没有更新mystring中的字符。其他子操作和替换操作正在工作 - 这让我觉得它是我使用Unicode字符的问题，或者是这个特定字符的问题。

更新：我也尝试过以下建议：

    mystring.decode("utf-8").replace(u"\u0092", u"\u0027").encode("utf-8")
    mystring.decode("utf-8").replace(u"\u2019", u"\u0027").encode("utf-8")

但它给了我错误：AttributeError：'str'对象没有属性'decode'

Just to Clarify：IDE不是这里的核心问题。我的问题是为什么当我用Unicode字符运行replace或sub并打印结果时它没有注册 - 字符仍然存在于字符串中。

Answer 1

你的代码是错误的\u2019撇号（'）。来自维基百科

U + 0092 146私人使用2 PU2

这就是为什么日食不快乐。

使用正确的代码：

#_*_ coding: utf8 _*_
import re
string = u"dkfljglkdfjg’fgkljlf"
string = string.replace(u"’", u"'"))
string = string.replace(u"\u2019", u"\u0027")
string = re.sub(u'\u2019',u"\u0027", string)
string = re.sub(u'’',u"'", string)

所有解决方案都有效

并且不要打电话给你的vars str

python替换和sub不使用unicode字符u“\ u0092”

1 个答案: