我试图这样做:
import re
sentence = "How are you?"
print(re.split(r'\b', sentence))
结果是
[u'How are you?']
我想要像[u'How', u'are', u'you', u'?']
这样的东西。如何实现这一目标?
答案 0 :(得分:9)
不幸的是,Python无法通过空字符串进行拆分。
要解决此问题,您需要使用findall
代替split
。
实际上\b
只意味着词边界。
相当于(?<=\w)(?=\W)|(?<=\W)(?=\w)
。
这意味着,以下代码可以工作:
import re
sentence = "How are you?"
print(re.findall(r'\w+|\W+', sentence))
答案 1 :(得分:2)
import re
split = re.findall(r"[\w']+|[.,!?;]", "How are you?")
print(split)
<强>输出:强>
['How', 'are', 'you', '?']
正则表达式说明:
"[\w']+|[.,!?;]"
1st Alternative: [\w']+
[\w']+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
' the literal character '
2nd Alternative: [.,!?;]
[.,!?;] match a single character present in the list below
.,!?; a single character in the list .,!?; literally