正则表达式将字符串拆分为连续的字母数字部分

时间:2014-01-15 17:59:51

标签: regex

我想将一个字符串拆分为具有共享某些属性的连续字母的子字符串:特别是字母数字(尽管对一般解决方案感兴趣)。

E.g。 "string#example[is-like="html"].selectors"

将匹配[string, #, example, [, is, -, like, =", html, "]., selectors]

知道如何在RegEx中执行此操作吗?谢谢!

编辑:我将通过preg_match_all使用PHP的RegEx引擎。

2 个答案:

答案 0 :(得分:2)

\w+|\W+

单词字符的一个或多个后果 OR 非单词字符的一个或多个后果

<强>输出

Array
    (
        [0] => string
        [1] => #
        [2] => example
        [3] => [
        [4] => is
        [5] => -
        [6] => like
        [7] => ="
        [8] => html
        [9] => "].
        [10] => selectors
    )

答案 1 :(得分:1)

使用word boundary anchor,例如在C#:

splitArray = Regex.Split(subjectString, @"\b");

如果您想避免在字符串的开头/结尾处出现空匹配,请将其与lookaround assertions结合使用:

splitArray = Regex.Split(subjectString, @"(?<!^)\b(?!$)");

<强>说明:

(?<!^) # Assert we're not at the start of the string
\b     # Match a position between an alnum an a non-alnum character
(?!$)  # Assert we're not at the end of the string, either

通用解决方案如下所示:

假设您想要在数字(\d)和非数字(\D)之间进行拆分。然后你可以使用

splitArray = Regex.Split(subjectString, @"(?<=\d)(?=\D)|(?<=\D)(?=\d)");

<强>说明:

(?<=\d) # Assert that the previous character is a digit
(?=\D)  # and the next character is a non-digit.
|       # Or:
(?<=\D) # Assert that the previous character is a non-digit
(?=\d)  # and the next character is a digit.