Question

为了清理一些字符串，我必须删除一些包含一些特殊UTF-8字符的子字符串。

示例：

source = "Skoda"
to_be_clean = "Škoda Rapid"

我需要将to_be_clean字符串source替换为空。显然，to_be_clean字符串包含一些特殊字符。有没有办法简单地完成这项任务。我今天就是这样做的。

output = to_be_clean.replace(source + ' ', '')

我正在考虑使用正则表达式，但我需要列出所有可能的字符。

Answer 1

unicodedata 模块可以解决您的问题。

# -*- coding: utf-8 -*-

import unicodedata
to_be_clean = u"Škoda Rapid"

print unicodedata.normalize('NFKD', to_be_clean).encode('ASCII', 'ignore')

<强>输出：

Skoda Rapid