Question

我正在尝试从数据库中读取用户名，如果有非UTF-8字符，则会抛出UnicodeDecodeError。

我不确定所有非UTF8字符是什么，我正在寻找解决方案。

我想保留特殊符号，但只过滤掉与UTF-8不兼容的符号。 ³和™（商标），不使用UTF-8，它们是我所知道的唯一两个。

我仍然希望保留中文符号，阿拉伯语等。这就是我使用UTF8的原因。

代码：

def is_author_used(author):
        with open("C:\\Users\\Administrator\\Desktop\\authors.txt", 'r', encoding='utf-8') as f:
            content = f.read().splitlines()
        if author in content:
            return True
        return False

    def set_author_used(author):
        with open("C:\\Users\\Administrator\\Desktop\\authors.txt", 'a', encoding='utf-8') as f:
            f.write(author + '\r\n')

Answer 1

也许是这样的：

with open('text.txt', encoding='utf-8', errors='ignore') as f:
    content = f.read().splitlines()

从文件内容中删除非UTF8字符

1 个答案: