我一直试图使用正则表达式"解析" 一些数据,我感觉好像我已经关闭,但我可以&#39似乎把它带回家。
需要解析的数据通常如下所示:<param>: <value>\n
。参数的数量可以变化,就像值一样。不过,这是一个例子:
FooID: 123456 Name: Chuck When: 01/02/2013 01:23:45 InternalID: 789654 User Message: Hello, this is nillable, but can be quite long. Text can be spread out over many lines And can start with any number of \n's. It can be empty, too. What's worse, though is that this CAN contain colons (but they're _"escaped"_ using `\`), and even basic markup!
为了将这个文本推入一个对象,我把这个小小的表达放在一起
if (preg_match_all('/^([^:\n\\]+):\s*(.+)/m', $this->structuredMessage, $data))
{
$data = array_combine($data[1], $data[2]);
//$data is assoc array FooID => 123456, Name => Chuck, ...
$report = new Report($data);
}
现在,除User Message
位.
之外,大部分时间都可以正常工作:s
不匹配新行,因为如果我要使用FooID:
标志,第二组将匹配$msg = explode(end($data[1], $string);
$data[2][count($data[2])-1] = array_pop($msg);
之后的所有内容,直到字符串的最后
我不得不使用肮脏的解决方法:
InternalID
经过一些测试后,我逐渐明白,有时候,一个或两个参数没有被填入(例如/^([^:\n\\]++)\s{0,}:(.*+)(?!^[^:\n\\]++\s{0,}:)/m
//or:
/^([^:\n\\]+)\s{0,}:(.*)(?!^[^:\\\n]+\s{0,}:)/m
可能是空的)。在这种情况下,我的表达不会失败,而是导致:
[1] => Array ( [0] => FooID [1] => Name [2] => When [3] => InternalID ) [2] => Array ( [0] => 123465 [1] => Chuck [2] => 01/02/2013 01:23:45 [3] => User Comment: Hello, )
我一直在尝试各种其他表达方式,并想出了这个:
InternalID: <void>
第二个版本稍慢。
这解决了我对User Message: <multi-line>
所遇到的问题,但仍然给我留下了最后的障碍:s
。使用^([^:\n\\]++)\s{0,}:((\n(?![^\n:\\]++\s{0,}:)|.)*+)
标志并不是用我的表达式ATM来做的。
我只能想到这个:
{{1}}
至少在我看来,这太复杂了,不是唯一的选择。想法,建议,链接......任何事情都会非常感激
答案 0 :(得分:1)
我对PHP很陌生,所以也许这完全没有问题,但也许你可以使用像
这样的东西$data = <<<EOT
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n's. It can be empty, too
EOT;
if ($key = preg_match_all('~^[^:\n]+?:~m', $data, $match)) {
$val = explode('¬', preg_filter('~^[^:\n]+?:~m', '¬', $data));
array_shift($val);
$res = array_combine($match[0], $val);
}
print_r($res);
产量
Array
(
[FooID:] => 123456
[Name:] => Chuck
[When:] => 01/02/2013 01:23:45
[InternalID:] => 789654
[User Message:] => Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of
's. It can be empty, too
)
答案 1 :(得分:1)
以下正则表达式应该可行,但我不太确定它是否是正确的工具:
preg_match_all(
'%^ # Start of line
([^:]*) # Match anything until a colon, capture in group 1
:\s* # Match a colon plus optional whitespace
( # Match and capture in group 2:
(?: # Start of non-capturing group (used for alternation)
.*$ # Either match the rest of the line
(?= # only if one of the following follows here:
\Z # The end of the string
| # or
\r?\n # a newline
[^:\n\\\\]* # followed by anything except colon, backslash or newline
: # then a colon
) # End of lookahead
| # or match
(?: # Start of non-capturing group (used for alternation/repetition)
[^:\\\\] # Either match a character except colon or backslash
| # or
\\\\. # match any escaped character
)* # Repeat as needed (end of inner non-capturing group)
) # End of outer non-capturing group
) # End of capturing group 2
$ # Match the end of the line%mx',
$subject, $result, PREG_PATTERN_ORDER);
答案 2 :(得分:0)
我想我会避免使用正则表达式执行此任务,而是将其拆分为子任务。
\n
explode
上的字符串
:
将结果字符串拆分为explode
,限制为2. 此算法假定没有带有转义冒号的键。数值中的转义冒号将被处理得很好(即用户输入)。
$str = <<<EOT
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID:
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
This\: works too. And can start with any number of \\n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_
using `\`) like so `\:`, and even basic markup!
EOT;
$arr = explode("\n", $str);
$prevKey = '';
$split = ': ';
$output = array();
for ($i = 0, $arrlen = sizeof($arr); $i < $arrlen; $i++) {
$keyValuePair = explode($split, $arr[$i], 2);
// ?: Is this a valid key/value pair
if (sizeof($keyValuePair) < 2 && $i > 0) {
// -> Nope, append the value to the previous key's value
$output[$prevKey] .= "\n" . $keyValuePair[0];
}
else {
// -> Maybe
// ?: Did we miss an escaped colon
if (substr($keyValuePair[0], -1) === '\\') {
// -> Yep, this means this is a value, not a key/value pair append both key and
// value (including the split between) to the previous key's value ignoring
// any colons in the rest of the string (allowing dates to pass through)
$output[$prevKey] .= "\n" . $keyValuePair[0] . $split . $keyValuePair[1];
}
else {
// -> Nope, create a new key with a value
$output[$keyValuePair[0]] = $keyValuePair[1];
$prevKey = $keyValuePair[0];
}
}
}
var_dump($output);
array(5) {
["FooID"]=>
string(6) "123456"
["Name"]=>
string(5) "Chuck"
["When"]=>
string(19) "01/02/2013 01:23:45"
["InternalID"]=>
string(0) ""
["User Message"]=>
string(293) "Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
This\: works too. And can start with any number of \n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_
using `\`) like so `\:`, and even basic markup!"
}
<强> Online demo 强>
答案 3 :(得分:0)
所以这就是我用一个棘手的preg_replace_callback()
:
$string ='FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n\'s. It can be empty, too
Yellow:cool';
$array = array();
preg_replace_callback('#^(.*?):(.*)|.*$#m', function($m)use(&$array){
static $last_key = ''; // We are going to use this as a reference
if(isset($m[1])){// If there is a normal match (key : value)
$array[$m[1]] = $m[2]; // Then add to array
$last_key = $m[1]; // define the new last key
}else{ // else
$array[$last_key] .= PHP_EOL . $m[0]; // add the whole line to the last entry
}
}, $string); // Anonymous function used thus PHP 5.3+ is required
print_r($array); // print
下行:我正在使用PHP_EOL
添加与操作系统相关的换行符。