匹配(配对)令牌(例如括号或引号)

时间:2012-06-17 05:31:10

标签: php tokenize code-completion brackets

简而言之,我需要一个函数,通过添加括号/引号来尝试进行基本的代码修复,用于解析目的。也就是说,结果代码不会被运行

让我们看几个例子:

[1] class Aaa { $var a = "hi";       =>  class Aaa { $var a = "hi"; }
[2] $var a = "hi"; }                 =>  { $var a = "hi"; }
[3] class { a = "hi; function b( }   =>  class { a = "hi; function b( }"}
[4] class { a = "hi"; function b( }  =>  class { a = "hi"; function b() {}}

PS:上面的第四个例子看起来很复杂,但事实上,这很容易。如果引擎找到与堆栈不匹配的结束括号标记,则它应该在该标记之前具有相反的标记。如你所见,这很有效。


作为函数签名,它看起来像:balanceTokens($code, $bracket_tokens, $quote_tokens)

我写的函数使用堆栈工作。好吧,它并不完全工作,但确实使用了堆栈。

function balanceTokens($code, $bracket_tokens, $quote_tokens){
    $stack = array(); $last = null; $result = '';
    foreach(str_split($code) as $c){
        if($last==$c && in_array($c, $quote_tokens)){
            // handle closing string
            array_pop($stack);
        }elseif(!in_array($last, $quote_tokens)){
            // handle other tokens
            if(isset($bracket_tokens[$c])){
                // handle begining bracket
                $stack[] = $c;
            }elseif(($p = array_search($c, $bracket_tokens)) != false){
                // handle ending bracket
                $l = array_pop($stack);   
                if($l != $p)$result .= $p;
            }elseif(isset($quote_tokens[$c])){
                // handle begining quote
                $stack[] = $c;
                $last = $c;
            }// else other token...
        }
        $result .= $c;
    }
    // perform fixes
    foreach($stack as $token){
        // fix ending brackets
        if(isset($bracket_tokens[$token]))
            $result .= $bracket_tokens[$token];
        // fix begining brackets
        if(in_array($token, $bracket_tokens))
            $result = $token . $result;
    }
    return $result;
}

该函数的调用如下:

$new_code = balanceTokens(
    $old_code,
    array(
        '<' => '>',
        '{' => '}',
        '(' => ')',
        '[' => ']',
    ),
    array(
        '"' => '"',
        "'" => "'",
    )
);

是的,它非常通用,没有任何硬编码令牌。

我一点也不知道为什么它不起作用......事实上,我甚至都不知道它是否应该起作用。我承认我没有太多考虑写它。也许有一些我没有看到的明显问题。

2 个答案:

答案 0 :(得分:2)

另一种实施方式(更积极的平衡):

function balanceTokens($code) {
    $tokens = [
        '{' => '}',
        '[' => ']',
        '(' => ')',
        '"' => '"',
        "'" => "'",
    ];
    $closeTokens = array_flip($tokens);
    $stringTokens = ['"' => true, '"' => true];

    $stack = [];
    for ($i = 0, $l = strlen($code); $i < $l; ++$i) {
        $c = $code[$i];

        // push opening tokens to the stack (for " and ' only if there is no " or ' opened yet)
        if (isset($tokens[$c]) && (!isset($stringTokens[$c]) || end($stack) != $c)) {
            $stack[] = $c;
        // closing tokens have to be matched up with the stack elements
        } elseif (isset($closeTokens[$c])) {
            $matched = false;

            while ($top = array_pop($stack)) {
                // stack has matching opening for current closing
                if ($top == $closeTokens[$c]) {
                    $matched = true;
                    break;
                }

                // stack has unmatched opening, insert closing at current pos
                $code = substr_replace($code, $tokens[$top], $i, 0);
                $i++;
                $l++;
            }

            // unmatched closing, insert opening at start
            if (!$matched) {
                $code = $closeTokens[$c] . $code;
                $i++;
                $l++;
            }
        }
    }

    // any elements still on the stack are unmatched opening, so insert closing
    while ($top = array_pop($stack)) {
        $code .= $tokens[$top];
    }

    return $code;
}

一些例子:

$tests = array(
    'class Aaa { public $a = "hi";',
    '$var = "hi"; }',
    'class { a = "hi; function b( }',
    'class { a = "hi"; function b( }',
    'foo { bar[foo="test',
    'bar { bar[foo="test] { bar: "rgba(0, 0, 0, 0.1}',
);

将这些传递给函数会给出:

class Aaa { public $a = "hi";}
{$var = "hi"; }
class { a = "hi; function b( )"}
class { a = "hi"; function b( )}
foo { bar[foo="test"]}
bar { bar[foo="test"] { bar: "rgba(0, 0, 0, 0.1)"}}

答案 1 :(得分:0)

喝完咖啡后:),我想出了一个(有点)工作原型功能。

你可以看到它in action here。但是,稍微修改它以添加一些(闪亮的)调试输出。

    /**
     * Fix some possible issues with the code (eg, incompleteness).
     * @param string $code The code to sanitize.
     * @param array $bracket_tokens List of bracket tokens where the index is the begin bracket and the value is the end bracket.
     * @param array $quote_tokens List of quote tokens where the index is the begin quote and the value is the end quote.
     * @return string The sanitized code.
     */
    function css_sanitize($code, $bracket_tokens, $quote_tokens){
        $result = '';
        $stack = array();
        $last = '';
        foreach(str_split($code) as $c){
            if(in_array($c, $quote_tokens) && $last==$c){
                array_pop($stack);
                $last = '';
            }elseif(!in_array($last, $quote_tokens)){
                if(isset($bracket_tokens[$c])){
                    $stack[] = $c;
                }elseif(($p = array_search($c, $bracket_tokens)) != false){
                    if($last != $c){
                        $result .= $p;
                    }else{
                        array_pop($stack);
                        $last = (($p = count($stack)) > 1) ? $stack[$p] : '';
                    }
                }elseif(isset($quote_tokens[$c])){
                    $stack[] = $c;
                    $last = $c;
                }
            }
            $result .= $c;
        }
        foreach(array_reverse($stack) as $token){
            if(isset($bracket_tokens[$token])){
                $result .= $bracket_tokens[$token];
            }
            if(in_array($token, $bracket_tokens)){
                $result = $token . $result;
            }
            if(isset($quote_tokens[$token])){
                $result .= $quote_tokens[$token];
            }
        }
        return $result;
    }