使用带引号的值解析PHP字符串

时间:2012-11-27 01:18:36

标签: php regex parsing

我想解析如下字符串:

'serviceHits."test_server"."http_test.org" 31987'

进入如下数组:

[0] => serviceHits
[1] => test_server
[2] => http_test.org
[3] => 31987

基本上我想分成点和空格,将引号中的字符串视为单个值。

此字符串的格式不固定,这只是一个示例。它可能包含不同数量的元素,其中带引号和数字元素位于不同的位置。

其他字符串可能如下所示:

test.2 3                       which should parse to [test|2|3]
test."342".cake.2 "cheese"     which should parse to [test|342|cake|2|cheese]
test."red feet".3."green" 4    which should parse to [test|red feet|3|green|4]

有时候oid字符串可能包含一个引号,如果可能的话应该包含它,但它是解析器中最不重要的部分:

test."a \"b\" c" "cheese face" which should parse to [test|a "b" c|cheese face]

我正在尝试解析来自代理人编写的代理中的SNMP OID字符串,这些代理人对OID应该是什么样子的看法很不一样,并且是通用的。

解析oid字符串(用点分隔的位)将值(最后一个值)返回到单独的命名数组中会很好。在解析字符串之前简单地拆分空格是行不通的,因为OID和值都可以包含空格。

谢谢!

3 个答案:

答案 0 :(得分:3)

我同意这很难找到一个正则表达式来解决这个问题。

这是一个完整的解决方案:

$results = array();
$str = 'serviceHits."test_\"server"."http_test.org" 31987';

// Encode \" to something else temporary
$str_encoded_quotes = strtr($str,array('\\"'=>'####'));

// Split by strings between double-quotes
$str_arr = preg_split('/("[^"]*")/',$str_encoded_quotes,-1,PREG_SPLIT_DELIM_CAPTURE);

foreach ($str_arr as $substr) {

    // If value is a dot or a space, do nothing
    if (!preg_match('/^[\s\.]$/',$substr)) {

        // If value is between double-quotes, it's a string
        // Return as is
        if (preg_match('/^"(.*)"$/',$substr)) {
            $substr = preg_replace('/^"(.*)"$/','\1',$substr); // Remove double-quotes around
            $results[] = strtr($substr,array('####'=>'"'));    // Get escaped double-quotes back inside the string

        // Else, it must be splitted
        } else {
            // Split by dot or space
            $substr_arr = preg_split('/[\.\s]/',$substr,-1,PREG_SPLIT_NO_EMPTY);
            foreach ($substr_arr as $subsubstr)
                $results[] = strtr($subsubstr,array('####'=>'"')); // Get escaped double-quotes back inside string
        }
    }
    // Else, it's an empty substring
}

var_dump($results);

使用所有新的字符串示例进行测试。

首次尝试(OLD)

使用preg_split:

$str = 'serviceHits."test_server"."http_test.org" 31987';

// -1 : no limit
// PREG_SPLIT_NO_EMPTY : do not return empty results
preg_split('/[\.\s]?"[\.\s]?/',$str,-1,PREG_SPLIT_NO_EMPTY);

答案 1 :(得分:2)

最简单的方法可能是用占位符替换字符串内的点和空格,拆分,然后删除占位符。像这样:

$in = 'serviceHits."test_server"."http_test.org" 31987';

$a = preg_replace_callback('!"([^"]*)"!', 'quote', $in);
$b = preg_split('![. ]!', $a);
foreach ($b as $k => $v) $b[$k] = unquote($v);

print_r($b);


# the functions that do the (un)quoting

function quote($m){
    return str_replace(array('.',' '),
      array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'), $m[1]);
}
function unquote($str){
    return str_replace(array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'),
      array('.',' '), $str);
}

答案 2 :(得分:1)

这是一个适用于所有测试样本(加上我自己的测试样本)的解决方案,允许您转义引号,点和空格。

由于需要处理转义码,因此无法进行拆分。

虽然可以想象一个正则表达式将整个字符串与'()'匹配以标记单独的元素,但我无法使用preg_matchpreg_match_all使其正常工作。

相反,我逐步解析字符串,一次拉出一个元素。然后我使用stripslashes来取消引号,空格和点。

<?php

$strings = array
(
    'serviceHits."test_server"."http_test.org" 31987',
    'test.2 3',
    'test."342".cake.2 "cheese"',
    'test."red feet".3."green" 4',
    'test."a \\"b\\" c" "cheese face"',
    'test\\.one."test\\"two".test\\ three',
);

foreach ($strings as $string)
{
    print"'{$string}' => " . print_r(parse_oid($string), true) . "\n";
}

/**
 * parse_oid parses and OID and returns an array of the parsed elements.
 * This is an all-or-none function, and will return NULL if it cannot completely
 * parse the string.
 * @param string $string The OID to parse.
 * @return array|NULL A list of OID elements, or null if error parsing.
 */
function parse_oid($string)
{
    $result = array();
    while (true)
    {
        $matches = array();
        $match_count = preg_match('/^(?:((?:[^\\\\\\. "]|(?:\\\\.))+)|(?:"((?:[^\\\\"]|(?:\\\\.))+)"))((?:[\\. ])|$)/', $string, $matches);
        if (null !== $match_count && $match_count > 0)
        {
            // [1] = unquoted, [2] = quoted
            $value = strlen($matches[1]) > 0 ? $matches[1] : $matches[2];

            $result[] = stripslashes($value);

            // Are we expecting any more parts?
            if (strlen($matches[3]) > 0)
            {
                // I do this (vs keeping track of offset) to use ^ in regex
                $string = substr($string, strlen($matches[0]));
            }
            else
            {
                return $result;
            }
        }
        else
        {
            // All or nothing
            return null;
        }
    } // while
}

这会生成以下输出:

'serviceHits."test_server"."http_test.org" 31987' => Array
(
    [0] => serviceHits
    [1] => test_server
    [2] => http_test.org
    [3] => 31987
)

'test.2 3' => Array
(
    [0] => test
    [1] => 2
    [2] => 3
)

'test."342".cake.2 "cheese"' => Array
(
    [0] => test
    [1] => 342
    [2] => cake
    [3] => 2
    [4] => cheese
)

'test."red feet".3."green" 4' => Array
(
    [0] => test
    [1] => red feet
    [2] => 3
    [3] => green
    [4] => 4
)

'test."a \"b\" c" "cheese face"' => Array
(
    [0] => test
    [1] => a "b" c
    [2] => cheese face
)

'test\.one."test\"two".test\ three' => Array
(
    [0] => test.one
    [1] => test"two
    [2] => test three
)