我想解析如下字符串:
'serviceHits."test_server"."http_test.org" 31987'
进入如下数组:
[0] => serviceHits
[1] => test_server
[2] => http_test.org
[3] => 31987
基本上我想分成点和空格,将引号中的字符串视为单个值。
此字符串的格式不固定,这只是一个示例。它可能包含不同数量的元素,其中带引号和数字元素位于不同的位置。
其他字符串可能如下所示:
test.2 3 which should parse to [test|2|3]
test."342".cake.2 "cheese" which should parse to [test|342|cake|2|cheese]
test."red feet".3."green" 4 which should parse to [test|red feet|3|green|4]
有时候oid字符串可能包含一个引号,如果可能的话应该包含它,但它是解析器中最不重要的部分:
test."a \"b\" c" "cheese face" which should parse to [test|a "b" c|cheese face]
我正在尝试解析来自代理人编写的代理中的SNMP OID字符串,这些代理人对OID应该是什么样子的看法很不一样,并且是通用的。
解析oid字符串(用点分隔的位)将值(最后一个值)返回到单独的命名数组中会很好。在解析字符串之前简单地拆分空格是行不通的,因为OID和值都可以包含空格。
谢谢!
答案 0 :(得分:3)
我同意这很难找到一个正则表达式来解决这个问题。
这是一个完整的解决方案:
$results = array();
$str = 'serviceHits."test_\"server"."http_test.org" 31987';
// Encode \" to something else temporary
$str_encoded_quotes = strtr($str,array('\\"'=>'####'));
// Split by strings between double-quotes
$str_arr = preg_split('/("[^"]*")/',$str_encoded_quotes,-1,PREG_SPLIT_DELIM_CAPTURE);
foreach ($str_arr as $substr) {
// If value is a dot or a space, do nothing
if (!preg_match('/^[\s\.]$/',$substr)) {
// If value is between double-quotes, it's a string
// Return as is
if (preg_match('/^"(.*)"$/',$substr)) {
$substr = preg_replace('/^"(.*)"$/','\1',$substr); // Remove double-quotes around
$results[] = strtr($substr,array('####'=>'"')); // Get escaped double-quotes back inside the string
// Else, it must be splitted
} else {
// Split by dot or space
$substr_arr = preg_split('/[\.\s]/',$substr,-1,PREG_SPLIT_NO_EMPTY);
foreach ($substr_arr as $subsubstr)
$results[] = strtr($subsubstr,array('####'=>'"')); // Get escaped double-quotes back inside string
}
}
// Else, it's an empty substring
}
var_dump($results);
使用所有新的字符串示例进行测试。
首次尝试(OLD)
使用preg_split:
$str = 'serviceHits."test_server"."http_test.org" 31987';
// -1 : no limit
// PREG_SPLIT_NO_EMPTY : do not return empty results
preg_split('/[\.\s]?"[\.\s]?/',$str,-1,PREG_SPLIT_NO_EMPTY);
答案 1 :(得分:2)
最简单的方法可能是用占位符替换字符串内的点和空格,拆分,然后删除占位符。像这样:
$in = 'serviceHits."test_server"."http_test.org" 31987';
$a = preg_replace_callback('!"([^"]*)"!', 'quote', $in);
$b = preg_split('![. ]!', $a);
foreach ($b as $k => $v) $b[$k] = unquote($v);
print_r($b);
# the functions that do the (un)quoting
function quote($m){
return str_replace(array('.',' '),
array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'), $m[1]);
}
function unquote($str){
return str_replace(array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'),
array('.',' '), $str);
}
答案 2 :(得分:1)
这是一个适用于所有测试样本(加上我自己的测试样本)的解决方案,允许您转义引号,点和空格。
由于需要处理转义码,因此无法进行拆分。
虽然可以想象一个正则表达式将整个字符串与'()'匹配以标记单独的元素,但我无法使用preg_match
或preg_match_all
使其正常工作。
相反,我逐步解析字符串,一次拉出一个元素。然后我使用stripslashes
来取消引号,空格和点。
<?php
$strings = array
(
'serviceHits."test_server"."http_test.org" 31987',
'test.2 3',
'test."342".cake.2 "cheese"',
'test."red feet".3."green" 4',
'test."a \\"b\\" c" "cheese face"',
'test\\.one."test\\"two".test\\ three',
);
foreach ($strings as $string)
{
print"'{$string}' => " . print_r(parse_oid($string), true) . "\n";
}
/**
* parse_oid parses and OID and returns an array of the parsed elements.
* This is an all-or-none function, and will return NULL if it cannot completely
* parse the string.
* @param string $string The OID to parse.
* @return array|NULL A list of OID elements, or null if error parsing.
*/
function parse_oid($string)
{
$result = array();
while (true)
{
$matches = array();
$match_count = preg_match('/^(?:((?:[^\\\\\\. "]|(?:\\\\.))+)|(?:"((?:[^\\\\"]|(?:\\\\.))+)"))((?:[\\. ])|$)/', $string, $matches);
if (null !== $match_count && $match_count > 0)
{
// [1] = unquoted, [2] = quoted
$value = strlen($matches[1]) > 0 ? $matches[1] : $matches[2];
$result[] = stripslashes($value);
// Are we expecting any more parts?
if (strlen($matches[3]) > 0)
{
// I do this (vs keeping track of offset) to use ^ in regex
$string = substr($string, strlen($matches[0]));
}
else
{
return $result;
}
}
else
{
// All or nothing
return null;
}
} // while
}
这会生成以下输出:
'serviceHits."test_server"."http_test.org" 31987' => Array
(
[0] => serviceHits
[1] => test_server
[2] => http_test.org
[3] => 31987
)
'test.2 3' => Array
(
[0] => test
[1] => 2
[2] => 3
)
'test."342".cake.2 "cheese"' => Array
(
[0] => test
[1] => 342
[2] => cake
[3] => 2
[4] => cheese
)
'test."red feet".3."green" 4' => Array
(
[0] => test
[1] => red feet
[2] => 3
[3] => green
[4] => 4
)
'test."a \"b\" c" "cheese face"' => Array
(
[0] => test
[1] => a "b" c
[2] => cheese face
)
'test\.one."test\"two".test\ three' => Array
(
[0] => test.one
[1] => test"two
[2] => test three
)