使用PHP,我希望从包含编号列表的字符串中提取数组。
示例字符串:
The main points are: 1. This is point one. 2. This is point two. 3. This is point three.
会产生以下数组:
[0] => 1. This is point one.
[1] => 2. This is point two.
[2] => 3. This is point three.
字符串的格式可能会有所不同 - 例如:
1. This is point one, 2. This is point two, 3. This is point three.
1) This is point one 2) This is point two 3) This is point three
1 This is point one. 2 This is point two. 3 This is point three.
我已开始使用 preg_match_all ,其格式如下:
!((\d+)(\s+)?(\.?)(\)?)(-?)(\s+?)(\w+))!
但我不确定如何匹配其余的字符串/直到下一场比赛。
提供的示例答案 0 :(得分:4)
如果您的输入符合您的示例输入,因为每个“点”本身不包含数字,您可以使用以下正则表达式:
\d+[^\d]*
在PHP中,您可以使用preg_match_all()
捕获所有内容:
$text = 'The main points are: 1. This is point one. 2. This is point two. 3. This is point three.';
$matches = array();
preg_match_all('/(\d+[^\d]*)/', $text, $matches);
print_r($matches[1]);
这将导致:
Array
(
[0] => 1. This is point one.
[1] => 2. This is point two.
[2] => 3. This is point three.
)
但是,如果实际点本身中有任何数字/数字 - 这将无效。
如果您希望在每个点上显示实际数字,则需要定义每个点的实际“锚点”或“结束点”,例如句点。如果您可以声明.
仅出现在该点的末尾(忽略前导数字后面的潜在值),您可以使用以下正则表达式:
\d+[.)\s][^.]*\.
可以轻松地将其从上方放入preg_match_all()
:
preg_match_all('/(\d+[.)\s][^.]*\.)/', $text, $matches);
正则表达式解释说:
\d+ # leading number
[.)\s] # followed by a `.`, `)`, or whitespace
[^.]* # any non-`.` character(s)
\. # ending `.`
第二个正则表达式的警告是.
可能只出现在每个点的末尾(并且跟随前导数字)。但是,我认为这个规则可能比“点数中没有数字”规则更容易理解 - 这完全取决于你的实际输入。
答案 1 :(得分:0)
使用preg_split,这将更容易,只需根据您的编号格式拆分字符串,并返回非空结果。 修改它以满足您的需求:
<?php
$theReg = '/\d\.|\d\)|\d /';
$theStrs = array(
'1. This is point one, 2. This is point two, 3. This is point3' ,
'1) This is point one 2) This is point two 3) This is point 3' ,
'1 This is point one. 3 This is point three. 4 This is point 4'
);
foreach($theStrs as $str)
print_r(preg_split($theReg, $str , -1 , PREG_SPLIT_NO_EMPTY));;
?>