Question

我正在尝试解析包含大量跟踪的日志文件，其中一些跟踪有多行。

示例：

[trace-123] <request>This is a log line</request>
[trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
[trace-125] <request>final log line.</request>

我正在尝试使用preg_match_all来获取所有跟踪的数组。

$file = file_get_contents("traces.txt");
$tracePattern = "/(\[trace-[0-9]*+\]+[\s\S]*)(?<=\<\/reply>|\<\/request>)/";

preg_match_all($tracePattern,$file,$lines);

echo "<pre>";print_r($lines);echo "</pre>";

理想情况下，我希望我的结果看起来像这样：

Array
(
    [0] => [trace-123] <request>This is a log line</request>
    [1] => [trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
    [2] => [trace-125] <request>final log line.</request>
)

但是当我运行它时，我得到一个数组，其中包含数组的1个元素。当我写这个表达时，我的目标是基本上寻找：

[trace-\[0-9]*\]

找到从该匹配到下一场比赛的所有内容。

我找到了

\[trace-[0-9]*+\].*

工作得很好，但在有换行符时会崩溃。

Answer 1

以下可能是更好的方法。

$results = preg_split('/\R(?=\[trace[^\]]*\])/', $text);
print_r($results);

请参阅working demo

输出

Array
(
    [0] => [trace-123] <request>This is a log line</request>
    [1] => [trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
    [2] => [trace-125] <request>final log line.</request>
)

Answer 2

使用此：

$file = '[trace-123] <request>This is a log line</request>
[trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
[trace-125] <request>final log line.</request>';

$tracePattern = "/\[trace-[0-9]*+\]+\s*<(?:reply|request)>.*?<\/(?:reply|request)>/s";

preg_match_all($tracePattern,$file,$lines);

$lines = $lines[0]; // by defaults, $lines[0] will be an array of the matches, so get that

echo "<pre>";print_r($lines);echo "</pre>";

工作演示：http://ideone.com/n8n5r3

Answer 3

我建议通过preg_split

解决方案

preg_split('/\R+(?=\[trace-\d+])/', $str)

这导致以下

Array
(
    [0] => [trace-123] <request>This is a log line</request>
    [1] => [trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
    [2] => [trace-125] <request>final log line.</request>
)

Answer 4

这适用于MULTI_LINE模式。修剪前导空格和尾随换行符。

编辑：这假设一个[trace- ]的锚点位于
的开头行或开头加上非换行空格直到'trace'。这是
只有可辨别的记录分隔符。

 #  ^[^\S\n]*(\[trace-[^]]*\][^\n]*(?:(?!\s+\[trace-[^]]*\])\n[^\n]*)*)

 ^ [^\S\n]* 
 (
      \[trace- [^]]* \] [^\n]* 

      (?:
           (?! \s+ \[trace- [^]]* \] )
           \n [^\n]* 
      )*
 )

输出（单引号）

 '[trace-123] <request>This is a log line</request>'
 '[trace-124] <reply>This is another log line

 this is part of "[trace-124]" still.</reply>'
 '[trace-125] <request>final log line.</request>'

Answer 5

符号.表示除换行符\n以外的所有字符，您可以尝试以(.|\s)方式更改它：

#\[trace-[0-9]*+\](.|\s)*#

注意：您可以使用非捕获括号(?: )

Easyer，添加标记“s”

#\[trace-[0-9]*+\].*#s

Answer 6

您应该使用不情愿的量词（??，+?或*?）。

我相信这个正则表达式/(\[trace-[0-9]*\]\s*(?m:.*?)<\/(?:reply|request)>)/应该这样做...... (?m:.*?)部分是秘密。：）

Answer 7

这应该与标志s开启：

(\[trace-[0-9]+\].*?<\/(?:reply|request)>)

Live DEMO

PHP正则表达式在两种模式之间匹配

7 个答案: