Question

我有一个字符串，我是通过阅读带有子弹的页面的URL来获得的，因为项目符号列表具有“•”符号。请注意，该文本是使用Python 2.7的urllib2.read（webaddress）来自网址的html源代码。

我知道U + 2022的unicode字符，但是如何将unicode字符替换成类似的？

我试过了 str.replace（“•”，“某事”）;

但它似乎不起作用......我该怎么做？

Answer 1

将字符串解码为Unicode。假设它是UTF-8编码的：
```
str.decode("utf-8")
```
调用replace方法并确保将Unicode字符串作为其第一个参数传递给它：
```
str.decode("utf-8").replace(u"\u2022", "*")
```

如果需要，编码回UTF-8：

str.decode("utf-8").replace(u"\u2022", "*").encode("utf-8")

（幸运的是，Python 3阻止了这个混乱。第3步应该只在I / O之前执行。另外，请注意调用字符串str会影响内置类型{{ 1}}。）

Answer 2

将字符串编码为unicode。

>>> special = u"\u2022"
>>> abc = u'ABC•def'
>>> abc.replace(special,'X')
u'ABCXdef'

Answer 3

import re
regex = re.compile("u'2022'",re.UNICODE)
newstring = re.sub(regex, something, yourstring, <optional flags>)

Answer 4

我的解决方案：替换所有\u字符

def replace_unicode_character(content: str):
    content = content.encode('utf-8')
    unicode_characters = set([content[m.start(0) - 1:m.start(0)] for m in re.finditer(b'\x80', content)])
    for x in unicode_characters:
        content = content.replace(x, b"")
    return content.decode('utf-8')

Answer 5

有趣的答案隐藏在答案中。

str.replace("•", "something")

如果使用正确的语义就可以。

str.replace(u"\u2022","something")

创造奇迹；），向RParadox提示。

如何用python替换字符串中的unicode字符？

5 个答案: