正则表达式匹配除重复项以外的所有行

时间:2019-08-12 18:58:23

标签: php regex pcre preg-match-all

我有这段文字:

156.48.459.20 - - [11/Aug/2019
156.48.459.20 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019

我想匹配当天的所有行,所以我做了这个简单的正则表达式'/.*11\/Aug\/2019.*'

如您所见,文本中有两个重复的IP,我不想匹配重复的行,所以我进行了搜索,发现了这个正则表达式:(.).*\1 DEMO尽管这个正则表达式我尝试在当前的正则表达式中应用它有点奇怪,所以我这样做了:(.*11\/Aug\/2019.*)\1,它没有用。有人可以帮忙吗?

这是我想要的结果:

156.48.459.20 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019

注意:我正在使用函数preg_match_all()

preg_match_all('/(.*11\/Aug\/2019.*)\1/', $input_lines, $output_array);

3 个答案:

答案 0 :(得分:4)

需要纯正则表达式吗?

您可以使用PHP获得唯一性:

<?php
$input_lines = '156.48.459.20 - - [11/Aug/2019
156.48.459.20 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019';

preg_match_all( '/.*11\/Aug\/2019/m', $input_lines, $output_array );

// PHP associative array abuse incoming
// Flip the array so that the values become keys and flip it back
// This guarantees that only uniques survive
$output_array[ 0 ] = array_keys( array_flip( $output_array[ 0 ] ) );

var_dump( $output_array );

输出:

array(1) {
  [0]=>
  array(3) {
    [1]=>
    string(30) "156.48.459.20 - - [11/Aug/2019"
    [3]=>
    string(30) "235.145.41.12 - - [11/Aug/2019"
    [4]=>
    string(30) "66.23.114.251 - - [11/Aug/2019"
  }
}

答案 1 :(得分:2)

几乎是1班轮

'~(?m)^(?:([\d.]*[- ]*\[11/Aug/2019.*)\R*(?=[\S\s]*?\1)|(?!.*\[11/Aug/2019).*\R*)~'

Sample

Php

 $target = <<<'EOS'
 156.48.459.20 - - [11/Aug/2019
 156.48.459.20 - - [11/Aug/2019
 235.145.41.12 - - [11/Aug/2019
 235.145.41.12 - - [11/Aug/2019
 66.23.114.251 - - [11/Aug/2019
 66.23.114.251 - - [09/Aug/2019
 156.48.459.20 - - [11/Aug/2019
 235.145.41.12 - - [11/Aug/2019
 66.23.114.251 - - [01/Aug/2019
 66.23.114.251 - - [11/Aug/2019
 235.145.41.12 - - [11/Aug/2019
 EOS;


 $res = preg_replace ( '~(?m)^(?:([\d.]*[- ]*\[11/Aug/2019.*)\R*(?=[\S\s]*?\1)|(?!.*\[11/Aug/2019).*\R*)~', '', $target );

 echo $res."\n";

输出

156.48.459.20 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019

更好的视图

 (?m)
 ^ 
 (?:
      ( [\d.]* [- ]* \[ 11/Aug/2019 .* )  # (1)
      \R* 
      (?= [\S\s]*? \1 )
   |  
      (?! .* \[ 11/Aug/2019 )
      .*  \R* 
 )

答案 2 :(得分:0)

$txt = <<<'EOD'
156.48.459.20 - - [11/Aug/2019
156.48.459.20 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019
EOD;

$url = 'data:text/plain;base64,' . base64_encode($txt);
// change this line with the url of your log file: $url = '/path/to/file.log';

$result = [];

if ( false !== $handle = fopen($url, 'r') ) {
    while ( false !== $data = fgetcsv($handle, 1000, ' ') ) {
        if ( $data[3] === '[11/Aug/2019' )
            $result[$data[0]] = 1;
    }
}

$result = array_keys($result);

print_r($result);