使用PHP将复杂字符串解析为父/子数组

时间:2014-06-17 13:08:52

标签: php parsing

以下是字符串示例。

a (b, c(d and/or e, f, g), h, i[j, k]), l (m, n(o, p[q, r{s or t,u}, v]), w)

我需要解析这个:

{
-a
    -b
    -c
        -d 
        -and/or
        -e
        -f
        -g
    -h
    -i
        -j
        -k
-l
    -m
    -n
        -o
        -p
            -q
            -r
                -s
                -t
                -or
                -u
            -v
    -w
}

我开始搞乱一些正则表达式,但它很快就变得丑陋了。有什么建议吗?

感谢。

2 个答案:

答案 0 :(得分:1)

我对你的规则一无所知,但这段代码基本上可以完成这项工作

<?php
$string = 'a (b, c(d and/or e, f, g), h, i[j, k]), l (m, n(o, p[q, r{s or t,u}, v]), w)';
$indentLevel = 0;
$out = '';
echo '{'."\n";
// Split string into array of characters (AFAIK, that is basically how every parser works out there) and iterate over it
foreach (str_split($string) as $x) {
    // Determine if this character is string terminator or not
    $isTerminator = in_array($x, array(' ', ',', '(', '[', '{', ')', ']', '}'));
    // Output, because of string terminator, but only if output has something in it
    if ($isTerminator && strlen($out) > 0) {
        echo str_repeat("\t", $indentLevel).'-'.$out."\n";
        $out = '';
    }
    // Add to output (multiple character string support), if this is not string terminator
    elseif (!$isTerminator) {
        $out .= $x;
    }
    // Increase indent, because of brackets
    if (in_array($x, array('(', '[', '{'))) {
        $indentLevel++;
    }
    // Decrease indent, because of brackets
    elseif (in_array($x, array(')', ']', '}'))) {
        $indentLevel--;
    }
    // This is how you can tell that there is bracket count mismatch
    if ($indentLevel < 0) {
        die('Syntax error');
    }
}
echo '}'."\n";

请注意,我为字符串添加了多个字符支持,这是没有请求的,但我想,它会更好地展示基本想法。

我希望您能获得基本的想法,并且您将能够继续将此代码扩展到您特定需求的解析器​​中。

答案 1 :(得分:0)

没有赢得任何选美比赛,但是工作:

<?php
$s = 'a (b, c(d and/or e, f, g), h, i[j, k]), l (m, n(o, p[q, r{s or t,u}, v]), w)';

$chars = str_split($s);

$sep   = array(',', ' ');
$open  = array('(', '[', '{');
$close = array(')', ']', '}');

function parse($s)
{
    global $sep, $open, $close;

    $chars   = str_split($s);
    $arr     = array();
    $collect = '';

    for ($i = 0; $i < count($chars); $i++) {
        $c = $chars[$i];

        if (in_array($c, $open)) {
            $parens = 1;
            $inner  = '';
            do {
                $i++;
                $ch = $chars[$i];
                if (in_array($ch, $open)) {
                    $parens++;
                } elseif (in_array($ch, $close)) {
                    $parens--;
                }
                if ($parens > 0) {
                    $inner .= $ch;
                }
            } while ($parens > 0);

            if ($collect) {
                $arr[] = '-'.$collect;
            }
            $arr[]   = parse($inner);
            $collect = '';
            continue;
        }

        if (in_array($c, $sep)) {
            if ($collect == '') {
                continue;
            }
            $arr[]   = '-'.$collect;
            $collect = '';
        } else {
            $collect .= $c;
        }
    }

    if ($collect) {
        $arr[] = '-'.$collect;
    }

    return $arr;
}

print_r(parse($s));