我想用PHP中的preg_match_all在他们自己的组中捕获每一个:
请记住,我想忽略所有书名,字符串中的项目数可能是动态的,正则表达式应该适用于以下所有示例:
到目前为止,这是我设法提出的:
$str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3';
preg_match_all ('/([a-z]+)(?=\d|\d\s)\s*(\d*)\s*(?<=\d|\d\s)([a-z]?).*?(and|or)?/i', $str, $matches);
Array
(
[0] => Array
(
[0] => Pg3
)
[1] => Array
(
[0] => Pg
)
[2] => Array
(
[0] => 3
)
[3] => Array
(
[0] =>
)
[4] => Array
(
[0] =>
)
)
预期结果应为:
Array
(
[0] => Array
(
[0] => Ch 1 a and
[1] => Sect 2b and
[2] => Pg3
)
[1] => Array
(
[0] => Ch
[1] => Sect
[2] => Pg
)
[2] => Array
(
[0] => 1
[1] => 2
[2] => 3
)
[3] => Array
(
[0] => a
[1] => b
[2] =>
)
[4] => Array
(
[0] => and
[1] => and
[2] =>
)
)
答案 0 :(得分:0)
这是我能得到的最接近的:
$str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3';
preg_match_all ('/((Ch|Sect|Pg)\s?(\d+)\s?(\w?))(.*?(and|or))?/i', $str, $matches);
Array
(
[0] => Array
(
[0] => Ch 1 a unwantedtitle and
[1] => Sect 2b unwanted title and
[2] => Pg3
)
[1] => Array
(
[0] => Ch 1 a
[1] => Sect 2b
[2] => Pg3
)
[2] => Array
(
[0] => Ch
[1] => Sect
[2] => Pg
)
[3] => Array
(
[0] => 1
[1] => 2
[2] => 3
)
[4] => Array
(
[0] => a
[1] => b
[2] =>
)
[5] => Array
(
[0] => unwantedtitle and
[1] => unwanted title and
[2] =>
)
[6] => Array
(
[0] => and
[1] => and
[2] =>
)
)
答案 1 :(得分:0)
我就是这样做的。
$arr = array(
'Ch1 and Sect2b',
'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3',
'Ch 4 x unwantedtitle and Sect 5y unwanted title and' .
' Sect6 z and Ch7 or Ch8a',
'Assume this is ch1a and ch 2 or ch seCt 5c.' .
' Then SECT or chA pg22a and pg 13 andor'
);
foreach ($arr as $a) {
var_dump($a);
preg_match_all(
'~
\b(?P<word>ch|sect|(pg))
\s*(?P<number>\d+)
(?(2)\b|
\s*
(?P<letter>(?!(?<=\s)(?:and|or)\b)[a-z]+)?
\s*
(?:(?<=\s)(?P<cond>and|or)\b)?
)
~xi'
,$a,$m);
foreach ($m as $k => $v) {
if (is_numeric($k) && $k !== 0) unset($m[$k]);
// this is for 'beautifying' the result array
// note that $m[0] will still return whole matches
}
print_r($m);
}
我不得不将pg
变成一个捕获组,因为我需要为此明确写一个条件,也就是说,它可以附加一个数字(中间有或没有空格)但是它不能被追加任何考虑页面指示符的字母都不会有“pg23a”中的字母。
这就是为什么我选择命名每个组并通过代码中的内部foreach循环“美化”结果。否则,如果您选择使用数字索引(而不是命名索引),则需要跳过每个$m[2]
。
要在此处显示示例,请输入$arr
中最后一项的输出。
Array
(
[0] => Array
(
[0] => ch1a and
[1] => ch 2 or
[2] => seCt 5c
[3] => pg 13
)
[word] => Array
(
[0] => ch
[1] => ch
[2] => seCt
[3] => pg
)
[number] => Array
(
[0] => 1
[1] => 2
[2] => 5
[3] => 13
)
[letter] => Array
(
[0] => a
[1] =>
[2] => c
[3] =>
)
[cond] => Array
(
[0] => and
[1] => or
[2] =>
[3] =>
)
)