PHP:拆分包含不同分隔符列表的字符串,并保留有关分隔符的信息

时间:2017-02-01 17:15:35

标签: php string split substring delimiter

这实际上是我遇到的两个SO问题split string into words by using space a delimiterSplit string by other strings的组合,但没有找到解决方案。

假设分隔符数组是

$splitby = array('dlmtr1','dlmtr2','dlmtr3',' ','dlmtr5','dlmtr6');

$text = '  dlmtr1This is     the string dlmtr2dlmtr2TTTdlmtr5WWWWW ';


$textArr = ('This', 'is', 'the', 'string', 'TTT', 'WWWWW');

$delimiterArr = ('  dlmtr1', ' ', '     ', ' ', 'dlmtr2dlmtr2', 'dlmtr5',' ');

换句话说,

是真的
$text == $delimiterArr[0] . $textArr[0] . $delimiterArr[1] . $textArr[1] . ... . $delimiterArr(count($delimiterArr));

P.S。因此,$delimiterArr的每个项目都包含至少一个或多个分隔符。

模式的可能解决方案的步骤是:

$pattern = '/\s?'.implode($splitby, '\s?|\s?').'\s?/';

然后我继续以任何方式得到错误的结果。

更新:这是我已经接近预期结果但是 问题是分隔符是分开但是如果它们在一起的话它们应该在一起文本**

$splitby = array('dlmtr1','dlmtr2','dlmtr3',' ','dlmtr5','dlmtr6');
$text = '  dlmtr1This is     the string dlmtr2dlmtr2TTTdlmtr5WWWWW ';

$pattern = '/\s?'.implode($splitby, '\s?|\s?').'\s?/';
$result = preg_split($pattern, $text, -1, PREG_SPLIT_NO_EMPTY);
preg_match_all($pattern, $text, $matches);
print_r($result);
print_r($matches[0]);

结果:

Array
(
    [0] => This
    [1] => is
    [2] => the
    [3] => string
    [4] => TTT
    [5] => WWWWW
)
Array
(
    [0] =>   
    [1] => dlmtr1 '[0] and [1] should come together
    [2] =>  
    [3] =>    
    [4] =>   
    [5] =>  
    [6] =>  dlmtr2 '[6] and [7] should come together
    [7] => dlmtr2
    [8] => dlmtr5
    [9] =>  
)

谢谢。

1 个答案:

答案 0 :(得分:1)

下面的代码按预期运行。

$splitby = array('dlmtr1','dlmtr2','dlmtr3',' ','dlmtr5','dlmtr6');
$text = '  dlmtr1This is     the string dlmtr2dlmtr2TTTdlmtr5WWWWW ';

preg_match_all("/\s*(dlmtr[1-6])+\s*|\s+/", $text, $matches);
echo "<pre>";print_r($matches[0]);echo "</pre>";

Array
(
    [0] =>   dlmtr1
    [1] =>  
    [2] =>      
    [3] =>  
    [4] =>  dlmtr2dlmtr2
    [5] => dlmtr5
    [6] =>  
)

$result = explode(' ', trim(preg_replace("/\s*(dlmtr[0-9])+\s*|\s+/",' ', $text)));
echo "<pre>";print_r($result);echo "</pre>";

Array
(
    [0] => This
    [1] => is
    [2] => the
    [3] => string
    [4] => TTT
    [5] => WWWWW
)