Question

我试图这样做：

import re
sentence = "How are you?"
print(re.split(r'\b', sentence))

结果是

[u'How are you?']

我想要像[u'How', u'are', u'you', u'?']这样的东西。如何实现这一目标？

Answer 1

不幸的是，Python无法通过空字符串进行拆分。

要解决此问题，您需要使用findall代替split。

实际上\b只意味着词边界。

相当于(?<=\w)(?=\W)|(?<=\W)(?=\w)。

这意味着，以下代码可以工作：

import re
sentence = "How are you?"
print(re.findall(r'\w+|\W+', sentence))

Answer 2

import re
split = re.findall(r"[\w']+|[.,!?;]", "How are you?")
print(split)

<强>输出：

['How', 'are', 'you', '?']

Ideone Demo

Regex101 Demo

正则表达式说明：

"[\w']+|[.,!?;]"

    1st Alternative: [\w']+
        [\w']+ match a single character present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \w match any word character [a-zA-Z0-9_]
            ' the literal character '
    2nd Alternative: [.,!?;]
        [.,!?;] match a single character present in the list below
            .,!?; a single character in the list .,!?; literally

如何用正则表达式划分单词边界？

2 个答案: