我通过组合表情符号编码来获得一个句子,我想在“ \ u”字符后进行分隔
sentance = "Whoaaa\\ud83d\\udc4f"
和其他情况:
sentance = "blabla whoaaa\\ud83d\\udc4f blabla"
我想要这样的结果:
result= "blabla whoaaa \\ud83d\\udc4f blabla"
或
sentance = "Whoaaa \\ud83d\\udc4f"
答案 0 :(得分:0)
由于\ u不是字符而是unicode值语法的一部分,因此我认为使用正则表达式将很困难...
我要做的是测试每个字符是否为emoji
,如问题中所示:
How to check the Emoji property of a character in Python?
result = "".join([" " + c if test_emoji(c) else c for c in test_str])
答案 1 :(得分:0)
尝试一下
import re
pattern = re.compile('^[A-Za-z\s]*')
sentance1 = "Whoaaa\\ud83d\\udc4f"
sentance2 = "blabla whoaaa\\ud83d\\udc4f blabla"
string_before_emoji = pattern.findall(sentance1)[0]
emoji_only = sentance1.split(string_before_emoji)[1].replace('\\', '\\\\')
print(f"{string_before_emoji} {emoji_only}")
# Whoaaa \\ud83d\\udc4f
string_before_emoji = pattern.findall(sentance2)[0]
emoji_only = sentance2.split(string_before_emoji)[1].replace('\\', '\\\\')
print(f"{string_before_emoji} {emoji_only}")
# blabla whoaaa \\ud83d\\udc4f blabla
我使用的正则表达式模式
答案 2 :(得分:-2)
我猜测也许这个表达式可以做到这一点:
(?:\s|^)([^\\]+)(?=\\u|\\\\u)
re.sub
import re
regex = r"(?:\s|^)([^\\]+)(?=\\u|\\\\u)"
test_str = "blabla whoaaa\\\\ud83d\\\\udc4f blabla blabla whoaaa\\\\ud83d\\\\udc4f\\\\ud83d\\\\udc4f blabla\\\\ud83d blabla\\\\ud83d blabla\\\\ud83d "
subst = "\\1 "
print(re.sub(regex, subst, test_str))
blabla whoaaa \\ud83d\\udc4f blabla blabla whoaaa \\ud83d\\udc4f\\ud83d\\udc4f blabla \\ud83d blabla \\ud83d blabla \\ud83d
如果要浏览/简化/修改该表达式,请在this demo的右上角进行解释。