用PHP中的正则表达式解析标记

时间:2010-11-17 15:04:14

标签: php regex token

我希望解析一个类似下面的令牌文件,以获取令牌名称/值对。令牌/值/嵌套关系已经定义,所以我无法改变令牌文件的制作方式。看起来无上下文语法可能是最好的方法,但我没有写作或实现它的经验。是否有可能使用正则表达式?我对嵌套的多行令牌(如Master1,Servant2)没有任何好运。

;token1 = I am a top level single line token  
;token2 {  
    I am a top level  
    multiline line token  
}  

master1 {  
;servant1 = I am Master1, Servant1 single line token  
;servant2 {  
    I am Master1, Servant2.   
    A mulit line token.  
}  
;servant3 = I am Master1, Servant3  
}  
master2 {  
;servant1 = I am Master2, Servant1  
;servant2 {  
    I am Master2, Servant2  
A mulit line token.  
}  
;servant3 = I am Master2, Servant3  
}

2 个答案:

答案 0 :(得分:3)

PHP有一个用

标记字符串的函数
  • strtok - 将字符串(str)拆分为较小的字符串(标记),每个标记由标记中的任何字符分隔。也就是说,如果您有一个类似“这是一个示例字符串”的字符串,您可以使用空格字符作为标记将此字符串标记为单个字。

答案 1 :(得分:2)

这是一个相当简单的行走解析器(我最初试图为它编写一个正则表达式,但是在多行主控器的开头缺少一个前导;真的让它变得更难了(没有;缺失,写起来相当容易。)我放弃并写了这个):

function getTokens($string) {
    $string = trim($string);;
    $lines = explode("\n", $string);
    $data = array();
    $key = '';
    $open = 0;
    $buffer = '';
    foreach ($lines as $line) {
        $line = trim($line);
        if (empty($line)) {
            continue;
        } elseif (strpos($line, '}') === 0) {
            $open--;
            if ($open == 0) {
                $data[$key] = getTokens($buffer);
                $buffer = '';
            } elseif ($open < 0) {
                throw new Exception('Unmatched }');
            } else {
                $buffer .= "\n" . $line;
            }
        } elseif ($open > 0) {
            if (strpos($line, '{') !== false) {
                $open++;
            }
            $buffer .= "\n" . $line;
        } elseif ($line[0] == ';') {
            if (strpos($line, "=") !== false) {
                list ($key, $value) = explode("=", $line, 2);
                $key = trim(substr($key, 1));
                $value = trim($value);
                $data[$key] = $value;
            } elseif (strpos($line, "{") !== false) {
                $open++;
                list ($key, $value) = explode("{", $line, 2);
                $key = trim(substr($key, 1));
            } else {
                throw new Exception('Unmatched token ;');
            }
        } elseif (strpos($line, '{') !== false) {
            $open++;
            list ($key, $value) = explode("{", $line, 2);
            $key = trim($key);
        } else {
            $buffer .= "\n" . $line;
        }
    }
    if ($open > 0) {
        throw new Exception('Unmatched {');
    } elseif (empty($data) && !empty($buffer)) {
        return trim($buffer);
    }
    return $data;
}

当我把你的字符串作为输入时,我得到:

Array(
    "token1" => "I am a top level single line token",
    "token2" => "I am a top level
                    multiline line token",
    "master1" => Array(
        "servant1" => "I am Master1, Servant1 single line token",
        "servant2" => "I am Master1, Servant2.
                            A mulit line token.",
        "servant3" => "I am Master1, Servant3",
    ),
    "master2" => Array(
        "servant1" => "I am Master2, Servant1",
        "servant2" => "I am Master2, Servant2
                            A mulit line token.",
        "servant3" => "I am Master2, Servant3",
    ),
)