使用PHP从未格式化的数据创建数组

时间:2011-09-04 19:25:14

标签: php regex

我们的应用程序通过电子邮件接收日志文件,因此这些行通常会被电子邮件客户端分解。一旦我阅读了电子邮件的正文,我就会有以下格式的字符串变量$ log。

Fri Aug 26 11:52:30 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] 
PKCS11] built Fri Aug 26 11:52:30 2011 NOTE: OpenVPN 2.1 requires '--script-security 2' 
or higher to call user-defined scripts or executables Fri Aug 26 11:52:30 2011 
Control Channel Authentication: using 'ta.key' as a OpenVPN static key file 
Fri Aug 26 11:52:30 2011 Outgoing Control Channel Authentication: Using 160 
bit message hash 'SHA1' for HMAC authentication Fri Aug 26 11:52:30 
2011 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1'
for HMAC authentication Fri Aug 26 11:52:30 2011 LZO compression initialized 
Fri Aug 26 11:52:30 2011 Control Channel MTU parms [ L:1558 D:166 EF:66 EB:0 
ET:0 EL:0 ] Fri Aug 26 11:52:30 2011 Socket Buffers: R=[8192->8192] S=[8192->8192]

如上所示,日期并不总是在换行符上开始。我想生成一个包含日期和日志消息的数组,以便我可以在自己的列中输出包含这些字段的表。我知道我需要一个正则表达式来匹配日期字段,但我该如何构建数组呢?

4 个答案:

答案 0 :(得分:1)

我不是一名正则表达式专业人士,并且确保有一种更简单的方法,但这有效:

$input = "Wed Aug 03 13:56:31 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]
[PKCS11] built on Mar 12 2011
Wed Aug 03 13:56:31 2011 NOTE: OpenVPN 2.1 requires '--script-security
2' or higher to call user-defined scripts or executables
Wed Aug 03 13:56:31 2011 Control Channel Authentication: using 'ta.key'
as a OpenVPN static key file";

preg_match_all('/([\w]{3} [\w]{3} [0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{4}) (.*)/', $input, $matches, PREG_SET_ORDER);

var_dump($matches);

这导致:

array(3) {
    [0] =>
    array(3) {
        [0] =>
        string(67) "Wed Aug 03 13:56:31 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]"
        [1] =>
        string(24) "Wed Aug 03 13:56:31 2011"
        [2] =>
        string(42) "OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]"
    }
    [1] =>
    array(3) {
        [0] =>
        string(70) "Wed Aug 03 13:56:31 2011 NOTE: OpenVPN 2.1 requires '--script-security"
        [1] =>
        string(24) "Wed Aug 03 13:56:31 2011"
        [2] =>
        string(45) "NOTE: OpenVPN 2.1 requires '--script-security"
    }
    [2] =>
    array(3) {
        [0] =>
        string(71) "Wed Aug 03 13:56:31 2011 Control Channel Authentication: using 'ta.key'"
        [1] =>
        string(24) "Wed Aug 03 13:56:31 2011"
        [2] =>
        string(46) "Control Channel Authentication: using 'ta.key'"
    }
}

答案 1 :(得分:1)

我将完全用新版本更新我的答案,因为示例日志文件已经发生了很大变化。由于日志似乎在几乎任何地方被打破,这种方法 - 现在包括一些正则表达式:

$log="Fri Aug 26 11:52:30 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]  
PKCS11] built Fri Aug 26 11:52:30 2011 NOTE: OpenVPN 2.1 requires '--script-security 2'  
or higher to call user-defined scripts or executables Fri Aug 26 11:52:30 2011  
Control Channel Authentication: using 'ta.key' as a OpenVPN static key file  
Fri Aug 26 11:52:30 2011 Outgoing Control Channel Authentication: Using 160  
bit message hash 'SHA1' for HMAC authentication Fri Aug 26 11:52:30  
2011 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' 
for HMAC authentication Fri Aug 26 11:52:30 2011 LZO compression initialized  
Fri Aug 26 11:52:30 2011 Control Channel MTU parms [ L:1558 D:166 EF:66 EB:0  
ET:0 EL:0 ] Fri Aug 26 11:52:30 2011 Socket Buffers: R=[8192->8192] S=[8192->8192] 
";
$str = implode(' ',preg_split("/[ ]*[\r\n]+/", $log));
$arrLogLines=preg_split('/[ ]*([\w]{3} [\w]{3} [0-9]{2} [\d:]+ \d{4}) /',$str,-1,PREG_SPLIT_DELIM_CAPTURE); // Cred to Herbert for the regexp, seems to work fine..
array_shift($arrLogLines);
for ($i=0;$i<sizeof($arrLogLines);$i++) {
    if (($i/2)==(int)($i/2)) {
        $offset=0;
        $strArrIdx='date';
    } else {
        $offset=1;
        $strArrIdx='message';
    }
    $arrLogMessages[($i-$offset)/2][$strArrIdx]=$arrLogLines[$i];
}
var_dump($arrLogMessages);

它产生了预期的:

array(8) {
  [0]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(56) "OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] PKCS11] built"
  }
  [1]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(102) "NOTE: OpenVPN 2.1 requires '--script-security 2' or higher to call user-defined scripts or executables"
  }
  [2]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(75) "Control Channel Authentication: using 'ta.key' as a OpenVPN static key file"
  }
  [3]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(98) "Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication"
  }
  [4]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(98) "Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication"
  }
  [5]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(27) "LZO compression initialized"
  }
  [6]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(63) "Control Channel MTU parms [ L:1558 D:166 EF:66 EB:0 ET:0 EL:0 ]"
  }
  [7]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(46) "Socket Buffers: R=[8192->8192] S=[8192->8192] "
  }
}

答案 2 :(得分:1)

我相信这就是你要找的东西:

<?php

$log = <<<LOG
Wed Aug 03 13:56:31 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] 
[PKCS11] built on Mar 12 2011
Wed Aug 03 13:56:31 2011 NOTE: OpenVPN 2.1 requires '--script-security 
2' or higher to call user-defined scripts or executables
Wed Aug 03 13:56:31 2011 Control Channel Authentication: using 'ta.key' 
as a OpenVPN static key file
LOG;


function splitLog($log)
{
    $log = str_replace("\n",'~',$log);
    $log = str_replace("\r",'',$log);
    $log .= '~';
    preg_match_all('/([\w]{3} [\w]{3} [0-9]{2} [\d:]+ \d{4})((?:.*?~){2})/', $log, $m);

    $logArray = array();

    foreach($m[0] as $k=>$v)
    {
        $a['date'] = $m[1][$k];
        $a['message'] = trim(str_replace('~', '', $m[2][$k]));
        array_push($logArray, $a);
    }

    return $logArray;
}

$logArray = splitLog($log);
var_dump($logArray);

?>

输出

array
  0 => 
    array
      'date' => string 'Wed Aug 03 13:56:31 2011' (length=24)
      'message' => string 'OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] [PKCS11] built on Mar 12 2011' (length=72)
  1 => 
    array
      'date' => string 'Wed Aug 03 13:56:31 2011' (length=24)
      'message' => string 'NOTE: OpenVPN 2.1 requires '--script-security 2' or higher to call user-defined scripts or executables' (length=102)
  2 => 
    array
      'date' => string 'Wed Aug 03 13:56:31 2011' (length=24)
      'message' => string 'Control Channel Authentication: using 'ta.key' as a OpenVPN static key file' (length=75)

答案 3 :(得分:0)

如果每一行都以这样的日期开头,您可以使用substr。 日期存在于每一行,并且始终具有相同的长度。好吧,第一行也以一个sate结束,但它有不同的含义不同的符号。正则表达式也不会帮助你。