如何从包含编号列表的字符串中提取数组?

时间:2012-11-06 05:53:15

标签: php regex string

使用PHP,我希望从包含编号列表的字符串中提取数组。

示例字符串:

The main points are: 1. This is point one. 2. This is point two. 3. This is point three.

会产生以下数组:

[0] => 1. This is point one.
[1] => 2. This is point two.
[2] => 3. This is point three.

字符串的格式可能会有所不同 - 例如:

1. This is point one, 2. This is point two, 3. This is point three.
1) This is point one  2) This is point two 3) This is point three
1 This is point one. 2 This is point two. 3 This is point three.

我已开始使用 preg_match_all ,其格式如下:

!((\d+)(\s+)?(\.?)(\)?)(-?)(\s+?)(\w+))!

但我不确定如何匹配其余的字符串/直到下一场比赛。

RegExr

提供的示例

2 个答案:

答案 0 :(得分:4)

如果您的输入符合您的示例输入,因为每个“点”本身不包含数字,您可以使用以下正则表达式:

\d+[^\d]*

在PHP中,您可以使用preg_match_all()捕获所有内容:

$text = 'The main points are: 1. This is point one. 2. This is point two. 3. This is point three.';

$matches = array();
preg_match_all('/(\d+[^\d]*)/', $text, $matches);

print_r($matches[1]);

这将导致:

Array
(
    [0] => 1. This is point one.
    [1] => 2. This is point two.
    [2] => 3. This is point three.
)

但是,如果实际点本身中有任何数字/数字 - 这将无效。

如果您希望在每个点上显示实际数字,则需要定义每个点的实际“锚点”或“结束点”,例如句点。如果您可以声明.仅出现在该点的末尾(忽略前导数字后面的潜在值),您可以使用以下正则表达式:

\d+[.)\s][^.]*\.

可以轻松地将其从上方放入preg_match_all()

preg_match_all('/(\d+[.)\s][^.]*\.)/', $text, $matches);

正则表达式解释说:

\d+        # leading number
[.)\s]     # followed by a `.`, `)`, or whitespace
[^.]*      # any non-`.` character(s)
\.         # ending `.`

第二个正则表达式的警告是.可能只出现在每个点的末尾(并且跟随前导数字)。但是,我认为这个规则可能比“点数中没有数字”规则更容易理解 - 这完全取决于你的实际输入。

答案 1 :(得分:0)

使用preg_split,这将更容易,只需根据您的编号格式拆分字符串,并返回非空结果。 修改它以满足您的需求:

http://codepad.org/tK6fGCRB

<?php

$theReg = '/\d\.|\d\)|\d /';
$theStrs = array(
                '1. This is point one, 2. This is point two, 3. This is point3' ,
                '1) This is point one  2) This is point two 3) This is point 3' ,
                '1 This is point one. 3 This is point three. 4 This is point 4'
                );

foreach($theStrs as $str)
   print_r(preg_split($theReg, $str , -1 , PREG_SPLIT_NO_EMPTY));;
?>