Question

我看到了这一点：PHP preg_match bible scripture format

但是我的问题有点不同，因为我想提取这些元素，而不仅仅是匹配它们。我的模式更复杂：

'John 14:16–17, 25–26'
'John 14:16–17'
'John 14:16'
'John 14 16'
'John 14:16'
'John14 : 16'
'John     14 16'
'John14:　　　16'
'John14:16—17'
'John14 16 17'
'John14 : 16 17'
'John14 : 16  —   17'
'John    14 16 17'
'约翰福音 14    16 17' -> here is an actual example of unicode text

还应该考虑' - '，'：'和''为全角或半角字符，例如' - '，'：'和''，我的意思是两者都应该有效。

我想要的是提取 John（应该支持unicode），14,16和17 （如果存在）这些元素。

我试过了：

$str = '10 : 12 — 15  % 52 .633 __+_+)_01(&( %&@#32$%!85#@60$'; 
preg_match_all('/[\d]+?/isU',$str, $t);

效果不好。

然后我尝试了：

preg_match_all("([\u4e00-\u9fa5]+)[^\d\n]*(\d+)[^\d\n]*(\d+)[^\d\n]*(\d*)", "John 14:16", $out);
var_dump($out);

也行不通。

好的，我找到了解决方案，它有效，但我不确定它是否100％正确：

preg_match_all('#([\x{4e00}-\x{9fa5}]+)[^\d\n]*(\d+)[^\d\n]*(\d+)[^\d\n]*(\d*)#u', $keyword, $match);

Answer 1

^(\p{L}+)?\s*(\d+)?[\p{Pd}\p{Zs}:]*(\d+)?[\p{Pd}\p{Zs}:]*(\d+)?

您需要\p{L}才能匹配unicode字符。

\p{Zs}表示任何类型的空格，\p{Pd}任何类型的破折号或连字符。

Live demo

preg_match_all("/^(\p{L}+)?\s*(\d+)?[\p{Pd}\p{Zs}:]*(\d+)?[\p{Pd}\p{Zs}:]*(\d+)?/m", "John 14:16", $out);
var_dump($out);

如何在PHP中通过正则表达式提取圣经书名，章节和经文编号？

1 个答案: