Preg按字符和每个换行符分割

时间:2013-12-02 08:21:44

标签: php regex

我正在使用preg_split,使用“u”修饰符分割为php中的字符。我有一个问题,换行车没有分成一个条目,所以使用这一行:

preg_split('//u',"a töxt\n{{image}}", -1,PREG_SPLIT_NO_EMPTY);

我得到以下结果:

   Array (
    [0] => a
    [1] =>  
    [2] => t
    [3] => ö
    [4] => x
    [5] => t
    [6] =>  { //this line is orginally wrapped and not a space
    [7] => {
    [8] => i
    [9] => m
    [10] => a
    [11] => g
    [12] => e
    [13] => }
    [14] => } )

如果我在编码字符串之前检查有效字符我得到:

Array
(
    [data] => töxt
{{image}}
    [chars] => {t}{�}{�}{x}{t}{
}{{}{{}{i}{m}{a}{g}{e}{}}{}}
    [hex] => {74}{C3}{B6}{78}{74}{0A}{7B}{7B}{69}{6D}{61}{67}{65}{7D}{7D}
    [mb_chars] => {t}{ö}{x}{t}{
}{{}{{}{i}{m}{a}{g}{e}{}}{}}
    [mb_hex] => {74}{F6}{78}{74}{0A}{7B}{7B}{69}{6D}{61}{67}{65}{7D}{7D}
)

所以任何想法如何实现结果..这不仅是回车,而且实际上是最重要的..

还需要处理多字节字符

2 个答案:

答案 0 :(得分:1)

使用str_split function将字符串拆分为字符数组:

$str = "A\nBC";
$chrArray = str_split($str);
print_r($chrArray);

选项2:

preg_match_all('/./u', "a töxt\n{{image}}", $m);

输出

Array
(
    [0] => A
    [1] => 

    [2] => B
    [3] => C
)

UPDATE:在PHP 5.2.5中尝试这个之后我得到了这个

Warning: preg_split(): Compilation failed: this version of PCRE is not compiled with PCRE_UTF8 support at offset 0 on line 4

我相信您需要使用另一种方法将unicode字符串分解为字符数组。

答案 1 :(得分:0)

现在我找到了解决问题的解决方案:

$arr_content = preg_split("/(.|\\\\n)/u",$html_cont, -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

感谢任何帮助找到麻烦的人;)