正则表达式:匹配前x个单词,包括特殊字符和换行符

时间:2018-04-05 08:31:47

标签: php regex

我有以下正则表达式:

preg_match("/(.+?(?=\s)){7}/", $text, $matches);

我正在尝试使用前x个单词'从字符串本质上相当于在空格字符上拆分字符串的正则表达式。我没有使用\ w,因为我想在'单词中包含特殊字符。

我遇到了字符串换行问题:

https://regexr.com/3nd15

示例字符串:

this line doesn't have seven words.
This line does has more than 7 but the regex is ignoring the first line.

结果我 :(从第二行开始,因为第一行不到7个字)

This line does has more than 7

我想要的结果 :(行溢出)

this line doesn't have seven words. This

我已尝试添加多行标记而不做任何更改。

任何建议表示赞赏。

2 个答案:

答案 0 :(得分:1)

建议,你可以使用http://php.net/manual/en/function.preg-split.php并制作一个模式来匹配空格而不是单词。

$text = 'i only want
to get the first
seven words from this text';

$sevenWords = array_slice( preg_split('/\s+/',$text), 0, 7 );

var_dump( $sevenWords );

答案 1 :(得分:1)

您可以使用正则表达式来匹配使用

分隔的非空白块的7个空白块
'~\S+(?:\s+\S+){6}~'

请参阅regex demo。要仅在输入开头匹配此字​​符串,请在开头添加^

<强>详情

  • \S+ - 1 +非空白字符
  • (?:\s+\S+){6} - 出现1次以上的1个空格,然后是1个非空白字符。

PHP code:

$str = "this line doesn\'t have seven words.\nThis line does has more than 7 but the regex is ignoring the first line.";
if (preg_match_all('/\S+(?:\s+\S+){6}/', $str, $matches)) {
    print_r($matches[0]);
}
echo "\n";
if (preg_match('/^\S+(?:\s+\S+){6}/', $str, $match)) {
    print_r($match[0]);
}

输出:

Array
(
    [0] => this line doesn\'t have seven words.
This
    [1] => line does has more than 7 but
    [2] => the regex is ignoring the first line.
)

this line doesn\'t have seven words.
This