我希望将一个段落分解为句子然后分解为“爆炸”字符串,但需要将标点符号保留为数组的元素。
示例文字:
$meta = 'I am looking to break this paragraph into chunks.
I have researched, tried and tested various combinations; however, I cannot
seem to make it work. Would anyone help me figure this out?
I thank you in advance...'
Array ( [0] =>
Array ( [0] => I [1] => am [2] => looking [3] => to [4] => break [5] => [6] => this [7] => paragraph [8] => into [9] => chunks [10] => . )
[1] =>
Array ( [0] => I [2] => have [3] => researched [4] => , [5] => tried [......
......] [5] => figure [6] => this [7] => out [8] => ? )
[3] =>
Array ( [0] => I [1] => thank [2] => you [3] => in [4] => advance [5] => ... )
)
$s = preg_split('/\s*[!?.]\s*/u', $meta, -1, PREG_SPLIT_NO_EMPTY);
将句子分开,但是当这有效时,标点符号就会消失。
我非常感谢帮助用标点符号构建这个两级数组
答案 0 :(得分:1)
你可以使用preg_match:
做你想做的事$meta = 'I am looking to break this paragraph into chunks.
I have researched, tried and tested various combinations; however, I cannot
seem to make it work. Would anyone help me figure this out?
I thank you in advance...';
preg_match_all('/(\w+|[.;?,]+)/', $meta, $m);
print_r($m);
<强>解释强>
/ : regex delimiter
( : begin group 1
\w+ : 1 or more aphanumeric character <=> [a-zA-Z0-9_]
| : OR
[.;?,]+ : 1 or more punctuation
) : end of group 1
/ : regex delimiter
这将匹配并存储在第1组evry字中的每组标点符号。
如果您想兼容unicode,可以使用\p{L}
表示任何字母,\p{P}
表示标点符号:
/(\p{L}+|\p{P}+)/
<强>输出:强>
Array
(
[0] => Array
(
[0] => I
[1] => am
[2] => looking
[3] => to
[4] => break
[5] => this
[6] => paragraph
[7] => into
[8] => chunks
[9] => .
[10] => I
[11] => have
[12] => researched
[13] => ,
[14] => tried
[15] => and
[16] => tested
[17] => various
[18] => combinations
[19] => ;
[20] => however
[21] => ,
[22] => I
[23] => cannot
[24] => seem
[25] => to
[26] => make
[27] => it
[28] => work
[29] => .
[30] => Would
[31] => anyone
[32] => help
[33] => me
[34] => figure
[35] => this
[36] => out
[37] => ?
[38] => I
[39] => thank
[40] => you
[41] => in
[42] => advance
[43] => ...
)
[1] => Array
(
[0] => I
[1] => am
[2] => looking
[3] => to
[4] => break
[5] => this
[6] => paragraph
[7] => into
[8] => chunks
[9] => .
[10] => I
[11] => have
[12] => researched
[13] => ,
[14] => tried
[15] => and
[16] => tested
[17] => various
[18] => combinations
[19] => ;
[20] => however
[21] => ,
[22] => I
[23] => cannot
[24] => seem
[25] => to
[26] => make
[27] => it
[28] => work
[29] => .
[30] => Would
[31] => anyone
[32] => help
[33] => me
[34] => figure
[35] => this
[36] => out
[37] => ?
[38] => I
[39] => thank
[40] => you
[41] => in
[42] => advance
[43] => ...
)
)