preg_split正则表达式回溯多个匹配

时间:2014-11-19 14:32:26

标签: php regex unicode preg-split lookbehind

我的正则表达式的目标是拆分任何unicode空格,不包括换行符,并确保将换行符添加到前一个非unicode空格字符。目前我看到这项工作,但仅适用于\ n。

之前的单个空白字符

使用我目前的正则表达式:

    $data  = "the\nquick\n brown fox jumped     \nover the lazy dog.";
    $tokenized = preg_split("~(?<=\n)|\p{Z}+(?!\n)~u", $data, -1, PREG_SPLIT_OFFSET_CAPTURE);

当前结果(我添加了\ n,其中&#34; \ n&#34;字符存在):

Array
(
    [0] => Array
        (
            [0] => the\n

            [1] => 0
        )

    [1] => Array
        (
            [0] => quick\n

            [1] => 4
        )

    [2] => Array
        (
            [0] => 
            [1] => 10
        )

    [3] => Array
        (
            [0] => brown
            [1] => 11
        )

    [4] => Array
        (
            [0] => fox
            [1] => 17
        )

    [5] => Array
        (
            [0] => jumped
            [1] => 21
        )

    [6] => Array
        (
            [0] =>  \n
            [1] => 31
        )

    [7] => Array
        (
            [0] => over
            [1] => 33
        )

    [8] => Array
        (
            [0] => the
            [1] => 38
        )

    [9] => Array
        (
            [0] => lazy
            [1] => 42
        )

    [10] => Array
        (
            [0] => dog.
            [1] => 47
        )
)

预期结果:

Array
(
    [0] => Array
        (
            [0] => the\n
            [1] => 0
        )

    [1] => Array
        (
            [0] => quick\n
            [1] => 4
        )

    [2] => Array
        (
            [0] => brown
            [1] => 10
        )

    [3] => Array
        (
            [0] => fox
            [1] => 16
        )

    [4] => Array
        (
            [0] => jumped\n
            [1] => 20
        )

    [5] => Array
        (
            [0] => over
            [1] => 27
        )

    [6] => Array
        (
            [0] => the
            [1] => 32
        )

    [7] => Array
        (
            [0] => lazy
            [1] => 36
        )

    [8] => Array
        (
            [0] => dog.
            [1] => 41
        )
)

任何建议都非常感谢。感谢。

0 个答案:

没有答案