Question

似乎有很多关于在其他语言中执行此操作的帖子，但我似乎无法弄清楚如何使用Python（我使用2.7）。

要明确的是，我最好将字符串保留为unicode，只能替换某些特定的字符。

例如：

thisToken = u'tandh\u2013bm'
print(thisToken)

在中间打印带有m-dash的单词。我只想删除m-dash。（但不使用索引，因为我希望能够在找到这些特定字符的任何地方执行此操作。）

我尝试使用replace，就像使用任何其他角色一样：

newToke = thisToken.replace('\u2013','')
print(newToke)

但它不起作用。任何帮助深表感谢。塞特

Answer 1

您要搜索的字符串也必须是Unicode字符串。尝试：

newToke = thisToken.replace(u'\u2013','')

Answer 2

将字符串解码为Unicode。假设它是UTF-8编码的：

str.decode("utf-8")

调用replace方法并确保将Unicode字符串作为其第一个参数传递：

str.decode("utf-8").replace(u"\u2022", "")

如果需要，编码回UTF-8：

str.decode("utf-8").replace(u"\u2022", "").encode("utf-8")