Question

我已经在线搜索了一个解决方案，但这个问题有所不同，因为我不想删除所有非ASCII字符，只是删除它们中的一部分。

我有一行看起来像这样：

"[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]"

我想删除这些字符：

'…' , '⌉' , '⌈'

该文字来自here。

我尝试使用replace解决它，但每当我写下其中一个非ASCII字符时，我都会收到以下错误行：

SyntaxError：文件C中的非ASCII字符'\ xe2'：/ -------。py on line - ，但没有声明编码;

提前致谢。

Answer 1

使用str.translate，

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import string

s = "[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]"
r = s.translate(None, '…⌉⌈')

print(r)
# [x+]4 gur Id lú gal sik-kát  x x  []

Answer 2

'[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]'.encode().decode('ascii', errors='ignore')

出：

'[x+]4 gur Id l gal sik-kt  x x  []'

使用encode将字符串转换为字节，然后按ascii解码并输出错误。

我认为你应该使用re.sub：

import re

text = "[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]"

re.sub('[…⌉⌈]', '', text)  # this will replace all the element in [] with ''

出：

'[x+]4 gur Id lú gal sik-kát  x x  []'

python从字符串中删除特定的非ASCII字符

2 个答案: