我在文档中有一些Unicode字符串。我想要的是删除此Unicode代码或用一些空格(“”)替换它。示例=“”
doc = "Hello my name is Ruth \u2026! I really like swimming and dancing \ud83c"
如何将其转换为以下内容?
doc = "Hello my name is Ruth! I really like swimming and dancing"
我已经尝试过这个:https://stackoverflow.com/a/20078869/5505608,但没有任何反应。我正在使用Python 3。
答案 0 :(得分:2)
您可以编码为ASCII并忽略错误(即无法转换为ASCII字符的代码点)。
>>> doc = "Hello my name is Ruth \u2026! I really like swimming and dancing \ud83c"
>>> doc.encode('ascii', errors='ignore')
b'Hello my name is Ruth ! I really like swimming and dancing '
如果尾随空白困扰你,strip
关闭它。根据您的使用情况,您可以使用ASCII再次解码结果。链接一切看起来像这样:
>>> doc.encode('ascii', errors='ignore').strip().decode('ascii')
'Hello my name is Ruth ! I really like swimming and dancing'