Question

string = "Special $#! characters   spaces 888323 Kek  ཌི ༜ 郭 ༜  དྀ    "

结果应为：“Specialcharactersspaces888323Kek郭”

我尝试过
print ''.join(c for c in string.decode('utf-8') if u'\u4e00' <= c <= u'\u9fff')

但错误返回
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\u90ed' in position 4 9: ordinal not in range(128)

我的问题与标题相同，
删除特殊的chac，间距但不是中国字符

Answer 1

使用re.compile和re.sub函数的解决方案：

import re

string = "Special $#! characters   spaces 888323 Kek  ཌི ༜ 郭 ༜  དྀ    "

# defining the pattern which should match all characters excepting alphanumeric and chinese
pattern = re.compile(u'[^a-z0-9⺀-⺙⺛-⻳⼀-⿕々〇〡-〩〸-〺〻㐀-䶵一-鿃豈-鶴侮-頻並-龎]', re.UNICODE | re.IGNORECASE)
result = pattern.sub('', string)

# print(result)  Python v.3 printing
print result

输出：

Specialcharactersspaces888323Kek郭

Python 2.7删除特殊的chac，间距但不是中文字符

1 个答案: