我使用PHP函数preg_match_all()如下所示创建一个包含多个单词的数组。
// the string wich contains the text
$string = "Lorem ipsum dolor sit amet elit";
// the preg_match_all() function
preg_match_all('/([a-z]*?)(?= )/i', $string, $matches);
// debug array
debug($matches[0]);
// output
[(int) 0 => 'Lorem',
(int) 1 => '',
(int) 2 => 'ipsum',
(int) 3 => '',
(int) 4 => 'dolor',
(int) 5 => '',
(int) 6 => 'sit',
(int) 7 => '',
(int) 8 => 'amet',
(int) 9 => ''
]
但是当我用所有单词调试或打印数组时,最后一个单词将从数组中删除,在这种情况下,它将是单词" elit"。我该如何解决这个问题?
答案 0 :(得分:2)
您可以使用(?= |$)
作为预测,这意味着单词后跟非单词或输入结尾:
preg_match_all('/([a-z]+)(?=\W|$)/i', $string, $matches);
print_r($matches[0]);
<强>输出:强>
Array
(
[0] => Lorem
[1] => ipsum
[2] => dolor
[3] => sit
[4] => amet
[5] => consectetur
[6] => adipiscing
[7] => elit
[8] => Lorem
[9] => ipsum
[10] => dolor
[11] => sit
[12] => amet
[13] => consectetur
[14] => adipiscing
[15] => elit
)
顺便说一句,你可以使用拆分操作获得相同的结果:
$tokens = preg_split('/\h+/', $string);
\h
匹配水平空格。
答案 1 :(得分:2)
使用以下正则表达式模式获取所有单词
\ w 匹配任何单词字符(字母,数字,下划线)
preg_match_all('#\w+#', $string, $words);
print_r($words);
将输出
Array
(
[0] => Array
(
[0] => Lorem
[1] => ipsum
[2] => dolor
[3] => sit
[4] => amet
[5] => consectetur
[6] => adipiscing
[7] => elit
[8] => Lorem
[9] => ipsum
[10] => dolor
[11] => sit
[12] => amet
[13] => consectetur
[14] => adipiscing
[15] => elit
)
)