使用preg_match_all和正则表达式创建单词数组

时间:2016-01-12 15:04:48

标签: php regex

我使用PHP函数preg_match_all()如下所示创建一个包含多个单词的数组。

// the string wich contains the text 
$string = "Lorem ipsum dolor sit amet elit";

// the preg_match_all() function
preg_match_all('/([a-z]*?)(?= )/i', $string, $matches);

// debug array
debug($matches[0]);

// output
[(int) 0 => 'Lorem',
    (int) 1 => '',
    (int) 2 => 'ipsum',
    (int) 3 => '',
    (int) 4 => 'dolor',
    (int) 5 => '',
    (int) 6 => 'sit',
    (int) 7 => '',
    (int) 8 => 'amet',
    (int) 9 => ''
]

但是当我用所有单词调试或打印数组时,最后一个单词将从数组中删除,在这种情况下,它将是单词" elit"。我该如何解决这个问题?

2 个答案:

答案 0 :(得分:2)

您可以使用(?= |$)作为预测,这意味着单词后跟非单词或输入结尾:

preg_match_all('/([a-z]+)(?=\W|$)/i', $string, $matches);

print_r($matches[0]);

<强>输出:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor
    [3] => sit
    [4] => amet
    [5] => consectetur
    [6] => adipiscing
    [7] => elit
    [8] => Lorem
    [9] => ipsum
    [10] => dolor
    [11] => sit
    [12] => amet
    [13] => consectetur
    [14] => adipiscing
    [15] => elit
)

顺便说一句,你可以使用拆分操作获得相同的结果:

$tokens = preg_split('/\h+/', $string);

\h匹配水平空格。

答案 1 :(得分:2)

使用以下正则表达式模式获取所有单词

\ w 匹配任何单词字符(字母,数字,下划线)

preg_match_all('#\w+#', $string, $words);
print_r($words);

将输出

Array
(
    [0] => Array
        (
            [0] => Lorem
            [1] => ipsum
            [2] => dolor
            [3] => sit
            [4] => amet
            [5] => consectetur
            [6] => adipiscing
            [7] => elit
            [8] => Lorem
            [9] => ipsum
            [10] => dolor
            [11] => sit
            [12] => amet
            [13] => consectetur
            [14] => adipiscing
            [15] => elit
        )

)