正则表达式删除Python中的表情符号

时间:2016-04-08 20:24:06

标签: python regex

我试图从一段文字中删除表情符号,我从另一个问题看了这个正则表达式,并没有删除任何表情符号。你能让我知道我做错了什么,或者是否有更好的正则表达式从字符串中删除表情符号。

import re
myre = re.compile(u'('
u'\ud83c[\udf00-\udfff]|'
u'\ud83d[\udc00-\ude4f\ude80-\udeff]|'
 u'[\u2600-\u26FF\u2700-\u27BF])+', 
re.UNICODE)

def clean(inputFile,outputFile):
    with open(inputFile, 'r') as original,open(outputFile, 'w+') as out:
        for line in original:
            line=myre.sub('', line)

1 个答案:

答案 0 :(得分:1)

这样的东西?

import re
myre = re.compile('('
'\ud83c[\udf00-\udfff]|'
'\ud83d[\udc00-\ude4f\ude80-\udeff]|'
'[\u2600-\u26FF\u2700-\u27BF])+'.decode('unicode_escape'), 
re.UNICODE)

def clean(inputFile,outputFile):
    with open(inputFile, 'r') as original,open(outputFile, 'w+') as out:
        for line in original:
            line = myre.sub('', line.decode('utf-8'))
            print(line)