正则表达式语法中的问题

时间:2011-07-23 11:44:48

标签: php regex

我有一个txt文件,其中包含许多行,我想搜索版本和日期 什么正则表达式适合在数组中获得v 1.31.6.7 2008/03/07之类的内容

来自许多这样的txt文件:

此文件可能包含已创建,测试和的专有规则 由Sourcefire,Inc。(“VRT认证规则”)认证以及 Sourcefire和其他第三方创建的规则 $ Id:ddos.rules,v 1.31.6.7 2008/03/07 20:53:40 vrtbuild Exp DDOS规则

版本可以是不同的: v 1.48.6.12

像这样的格式

日期也不同

假设我有很多行重复

$Id: ddos.rules,v 1.31.6.7 2008/03/07 20:53:40 vrtbuild Exp

$Id: exploit.rules,v 1.116.6.53 2008/11/18 16:36:27 vrtbuild Exp $

$Id: misc.rules,v 1.77.6.20 2008/10/17 19:36:59 vrtbuild Exp $

$Id: smtp.rules,v 1.77.6.19 2008/10/17 19:37:00 vrtbuild Exp $

$Id: tftp.rules,v 1.28.6.6 2008/07/22 17:59:06 vrtbuild Exp $

$Id: web-iis.rules,v 1.110.6.11 2008/07/22 17:59:06 vrtbuild Exp $

$Id: web-attacks.rules,v 1.23 2005/05/16 22:18:17 mwatchinski Exp $

具有不同的日期值和v(版本)

我发现了这样的日期模式:

^(((0[1-9]|[12]\d|3[01])\/(0[13578]|1[02])\/((19|[2-9]\d)\d{2}))|((0[1-9]|[12]\d|30)\/(0[13456789]|1[012])\/((19|[2-9]\d)\d{2}))|((0[1-9]|1\d|2[0-8])\/02\/((19|[2-9]\d)\d{2}))|(29\/02\/((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))$

可以解释一下吗?

2 个答案:

答案 0 :(得分:1)

你的日期正则表达式:

^(((0[1-9]|[12]\d|3[01])\/(0[13578]|1[02])\/((19|[2-9]\d)\d{2}))|((0[1-9]|[12]\d|30)\/(0[13456789]|1[012])\/((19|[2-9]\d)\d{2}))|((0[1-9]|1\d|2[0-8])\/02\/((19|[2-9]\d)\d{2}))|(29\/02\/((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))$

......非常有趣。我决定对此进行分析,以确切了解它的匹配程度。事实证明,此正则表达式匹配从1900年到9999年格式为DD/MM/YYYY的所有有效日期。有趣的是,它也正确匹配从1597年到9999年的所有有效闰日。此正则表达式理解有效数字每个月的日子。它知道5月有31天而6月只有30天。它也知道2月有28天,除了闰年有29天。在这里它被分解,以便它可以被凡人阅读:

$re_date = '%
    # Match all valid DD/MM/YYYY dates from 1900 to 9999 and
    #   all leap days from year 1597 to 9999.
    ^                           # Anchor to start of string.
    ( # $1:
      (  # $2: Date format alternative 1: (months having 31 days)
        ( 0[1-9]|[12]\d|3[01])  # $3: Day: 01-09,10-19,20-29,30,31
        \/
        (0[13578]|1[02])        # $4: Month: 01,03,05,07,08,10,12
        \/
        ((19|[2-9]\d)\d{2})     # $5,$6: Year: 1900-9999
      )                         # End $2:
    | (  # $7: Date format alternative 2: (months having 30 days)
        (0[1-9]|[12]\d|30)      # $8: Day: 01-09,10-19,20-29,30
        \/
        (0[13456789]|1[012])    # $9: Month: 01,03-09,10-12
        \/
        ((19|[2-9]\d)\d{2})     # $10,$11: Year: 1900-9999
      )                         # End $7:
    | (  # $12: Date format alternative 3: (month having 28 days)
        (0[1-9]|1\d|2[0-8])     # $13: Day 01-09,10-19,20-28
        \/
        02                      # Month: 02
        \/
        ((19|[2-9]\d)\d{2})     # $14,$15: Year: 1900-9999
      )                         # End $12:
    | (  # $16: Date format alternative 3: (leap days)
        29                      # Day: 29
        \/
        02                      # Month: 02
        \/ # Match all valid leap day dates from year 1597 to 9999.
        (                       # $17: Year alt 1 (divisible by 4 but not 100)
          (1[6-9]|[2-9]\d)      # $18: Century part: 16-19,20-99
          ( 0[48]               # $19: Year part: Either 04-08
          | [2468][048]         # or 20,24,28,40,44,48,60,64,68,80,84,88
          | [13579][26]         # or 12,16,32,36,52,56,72,76,92,96,
          )                     # End $19:
        | (                     # or $20: Year alternative 2 (divisible by 400)
            ( 16                # $21: Century part: Either 16
            | [2468][048]       # or 20,24,28,40,44,48,60,64,68,80,84,88
            | [3579][26]        # or 32,36,52,56,72,76,92,96
            )                   # End $21:
            00                  # Year part: 00
          )                     # End $20:
        )                       # End $17:
      )                         # End $16:
    )                           # End $1:
    $                           # Anchor to end of string.
    %x';

为了解决您眼前的问题,这里有一个更精确的正则表达式:

$count = preg_match_all('%
    # Match version/date sub-string
    \b          # Anchor to word boundary.
    (           # $1: Version number.
      [Vv]      # Version identifier (allow V or v).
      [ ]+      # One or more spaces.
      [0-9]+    # Major version number is one or more digits.
      (?:       # Group minor version numbers.
        \.      # Minor versions separated by dot.
        [0-9]+  # Minor version is one or more digits.
      )*        # Zero or more minor versions.
    )           # End $1: Version number.
    [ ]+        # One or more spaces.
    (           # $2: Date.
      [0-9]{4}  # Year is four digits.
      /         # / Separator.
      [0-9]{2}  # Month is two digits.
      /         # / Separator.
      [0-9]{2}  # Day is two digits.
    )           # End $2: Date.
    %x', $text, $matches);
if ($count > 0) {
    $versions = $matches[1];
    $dates    = $matches[2];
    printf("Found %d matches:\n", $count);
    for ($i = 0; $i < $count; ++$i) {
        printf("  Match%3d:  Version: %-15s  Date: %s\n",
            $i + 1, $versions[$i], $dates[$i]);
    }
} else {
    echo("No matches found.\n");
}

注意:在处理诸如此类的非平凡正则表达式时,最好使用'x' 自由间距模式编写它们。这允许添加大量的注释和缩进,使其更容易阅读。

答案 1 :(得分:0)

foreach ($lines as $line){
    if (preg_match("|v (.*?) (.*?) |", $line, $match)){
        echo "found version ".$match[1]." date ".$match[2];
    }
}

你想要的确切吗?