Question

使用PHP，我希望从包含编号列表的字符串中提取数组。

示例字符串：

The main points are: 1. This is point one. 2. This is point two. 3. This is point three.

会产生以下数组：

[0] => 1. This is point one.
[1] => 2. This is point two.
[2] => 3. This is point three.

字符串的格式可能会有所不同 - 例如：

1. This is point one, 2. This is point two, 3. This is point three.
1) This is point one  2) This is point two 3) This is point three
1 This is point one. 2 This is point two. 3 This is point three.

我已开始使用 preg_match_all ，其格式如下：

!((\d+)(\s+)?(\.?)(\)?)(-?)(\s+?)(\w+))!

但我不确定如何匹配其余的字符串/直到下一场比赛。

RegExr

提供的示例

Answer 1

如果您的输入符合您的示例输入，因为每个“点”本身不包含数字，您可以使用以下正则表达式：

\d+[^\d]*

在PHP中，您可以使用preg_match_all()捕获所有内容：

$text = 'The main points are: 1. This is point one. 2. This is point two. 3. This is point three.';

$matches = array();
preg_match_all('/(\d+[^\d]*)/', $text, $matches);

print_r($matches[1]);

这将导致：

Array
(
    [0] => 1. This is point one.
    [1] => 2. This is point two.
    [2] => 3. This is point three.
)

但是，如果实际点本身中有任何数字/数字 - 这将无效。

如果您希望在每个点上显示实际数字，则需要定义每个点的实际“锚点”或“结束点”，例如句点。如果您可以声明.仅出现在该点的末尾（忽略前导数字后面的潜在值），您可以使用以下正则表达式：

\d+[.)\s][^.]*\.

可以轻松地将其从上方放入preg_match_all()：

preg_match_all('/(\d+[.)\s][^.]*\.)/', $text, $matches);

正则表达式解释说：

\d+        # leading number
[.)\s]     # followed by a `.`, `)`, or whitespace
[^.]*      # any non-`.` character(s)
\.         # ending `.`

第二个正则表达式的警告是.可能只出现在每个点的末尾（并且跟随前导数字）。但是，我认为这个规则可能比“点数中没有数字”规则更容易理解 - 这完全取决于你的实际输入。

Answer 2

使用preg_split，这将更容易，只需根据您的编号格式拆分字符串，并返回非空结果。修改它以满足您的需求：

http://codepad.org/tK6fGCRB

<?php

$theReg = '/\d\.|\d\)|\d /';
$theStrs = array(
                '1. This is point one, 2. This is point two, 3. This is point3' ,
                '1) This is point one  2) This is point two 3) This is point 3' ,
                '1 This is point one. 3 This is point three. 4 This is point 4'
                );

foreach($theStrs as $str)
   print_r(preg_split($theReg, $str , -1 , PREG_SPLIT_NO_EMPTY));;
?>

如何从包含编号列表的字符串中提取数组？

2 个答案: