根据标点符号将句子拆分为多个部分

时间:2013-10-23 09:04:33

标签: php regex explode trim preg-split

我花了最后一小时寻找回复,但我还没找到,所以我在这里问...

我需要一种方法(当然是正则表达式,但其他所有类似爆炸的方法都可以)将以下句子切成相同的数组:

  

这是第一部分,这是第二部分;这是第三部分!这是第四部分?再次 - 再次 - 直到句子结束。

我想要一个包含以下条目的数组(请不要使用以下或之前的空格或不是标点符号):

  • [0] => “这是第一部分”
  • [1] => “这是第二部分”
  • [2] => “这是第三部分”
  • [3] => “这是第四部分”
  • [4] => “再次”
  • [5] => “再来一次”
  • [6] => “直到判决结束”

编辑:对不起,以下示例是英文的,但它应该能够处理各种各样的脚本(基本上都是Unicode)。

非常感谢!

3 个答案:

答案 0 :(得分:1)

我找到了解决方案here

这是我使用多个分隔符进行爆炸输出的方法。

<?php

//$delimiters has to be array
//$string has to be array

function multiexplode ($delimiters,$string) {

    $ready = str_replace($delimiters, $delimiters[0], $string);
    $launch = explode($delimiters[0], $ready);
    return  $launch;
}

$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);

print_r($exploded);

//And output will be like this:
// Array
// (
//    [0] => here is a sample
//    [1] =>  this text
//    [2] =>  and this will be exploded
//    [3] =>  this also
//    [4] =>  this one too
//    [5] => )
// )

?>

答案 1 :(得分:1)

单个preg_split可以胜任:

$s = 'This is the first part, this is the second part; this is the third part! this is the fourth part? again - and again - until the sentence is over.';
print_r(preg_split('/\s*[,:;!?.-]\s*/u', $s, -1, PREG_SPLIT_NO_EMPTY));

<强>输出:

Array
(
    [0] => This is the first part
    [1] => this is the second part
    [2] => this is the third part
    [3] => this is the fourth part
    [4] => again
    [5] => and again
    [6] => until the sentence is over
)

答案 2 :(得分:0)

尝试使用此

$parts = preg_split("/[^A-Z\s]+/i", $string);
var_dump($parts);