我想将一个句子分成单词和特殊字符。我正在使用下面的正则表达式:
@"((\b[^\s]+\b)((?<=\.\w).)?)
但它只返回单词而不返回特殊字符,例如以空格分隔的连字符或冒号。
理想情况下,对于句子:
“现在!”她喊道,双手在空中飘扬 - 在几声欢呼声中 - 因为 大约两分钟。
我应该得到:
Right now she shouted and hands fluttered in the air - amid a few cheers - for about two minutes
答案 0 :(得分:1)
答案 1 :(得分:0)
也许用这样的模式分裂:
@"\s+(?:\p{P}(?!\s))?|\b\p{P}+\s*"
答案 2 :(得分:0)
万一你想要一个非正则表达式从句子中删除标点符号并仍然保持夸大其词:
import string
s = '"Right now!" she shouted, and hands fluttered in the air - amid a few cheers - for about two minutes.'
x = "".join([c for c in s if c =="-" or c not in string.punctuation])
输出:
'Right now she shouted and hands fluttered in the air - amid a few cheers - for about two minutes'
只需使用x.split()
将其标记为您想要的输出。