我有一个txt文件,其中包含许多行,我想搜索版本和日期
什么正则表达式适合在数组中获得v 1.31.6.7 2008/03/07
之类的内容
来自许多这样的txt文件:
此文件可能包含已创建,测试和的专有规则 由Sourcefire,Inc。(“VRT认证规则”)认证以及 Sourcefire和其他第三方创建的规则 $ Id:ddos.rules,v 1.31.6.7 2008/03/07 20:53:40 vrtbuild Exp DDOS规则
版本可以是不同的: v 1.48.6.12
像这样的格式日期也不同
假设我有很多行重复
$Id: ddos.rules,v 1.31.6.7 2008/03/07 20:53:40 vrtbuild Exp
$Id: exploit.rules,v 1.116.6.53 2008/11/18 16:36:27 vrtbuild Exp $
$Id: misc.rules,v 1.77.6.20 2008/10/17 19:36:59 vrtbuild Exp $
$Id: smtp.rules,v 1.77.6.19 2008/10/17 19:37:00 vrtbuild Exp $
$Id: tftp.rules,v 1.28.6.6 2008/07/22 17:59:06 vrtbuild Exp $
$Id: web-iis.rules,v 1.110.6.11 2008/07/22 17:59:06 vrtbuild Exp $
$Id: web-attacks.rules,v 1.23 2005/05/16 22:18:17 mwatchinski Exp $
具有不同的日期值和v(版本)
我发现了这样的日期模式:
^(((0[1-9]|[12]\d|3[01])\/(0[13578]|1[02])\/((19|[2-9]\d)\d{2}))|((0[1-9]|[12]\d|30)\/(0[13456789]|1[012])\/((19|[2-9]\d)\d{2}))|((0[1-9]|1\d|2[0-8])\/02\/((19|[2-9]\d)\d{2}))|(29\/02\/((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))$
可以解释一下吗?
答案 0 :(得分:1)
你的日期正则表达式:
^(((0[1-9]|[12]\d|3[01])\/(0[13578]|1[02])\/((19|[2-9]\d)\d{2}))|((0[1-9]|[12]\d|30)\/(0[13456789]|1[012])\/((19|[2-9]\d)\d{2}))|((0[1-9]|1\d|2[0-8])\/02\/((19|[2-9]\d)\d{2}))|(29\/02\/((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))$
......非常有趣。我决定对此进行分析,以确切了解它的匹配程度。事实证明,此正则表达式匹配从1900年到9999年格式为DD/MM/YYYY
的所有有效日期。有趣的是,它也正确匹配从1597年到9999年的所有有效闰日。此正则表达式理解有效数字每个月的日子。它知道5月有31天而6月只有30天。它也知道2月有28天,除了闰年有29天。在这里它被分解,以便它可以被凡人阅读:
$re_date = '%
# Match all valid DD/MM/YYYY dates from 1900 to 9999 and
# all leap days from year 1597 to 9999.
^ # Anchor to start of string.
( # $1:
( # $2: Date format alternative 1: (months having 31 days)
( 0[1-9]|[12]\d|3[01]) # $3: Day: 01-09,10-19,20-29,30,31
\/
(0[13578]|1[02]) # $4: Month: 01,03,05,07,08,10,12
\/
((19|[2-9]\d)\d{2}) # $5,$6: Year: 1900-9999
) # End $2:
| ( # $7: Date format alternative 2: (months having 30 days)
(0[1-9]|[12]\d|30) # $8: Day: 01-09,10-19,20-29,30
\/
(0[13456789]|1[012]) # $9: Month: 01,03-09,10-12
\/
((19|[2-9]\d)\d{2}) # $10,$11: Year: 1900-9999
) # End $7:
| ( # $12: Date format alternative 3: (month having 28 days)
(0[1-9]|1\d|2[0-8]) # $13: Day 01-09,10-19,20-28
\/
02 # Month: 02
\/
((19|[2-9]\d)\d{2}) # $14,$15: Year: 1900-9999
) # End $12:
| ( # $16: Date format alternative 3: (leap days)
29 # Day: 29
\/
02 # Month: 02
\/ # Match all valid leap day dates from year 1597 to 9999.
( # $17: Year alt 1 (divisible by 4 but not 100)
(1[6-9]|[2-9]\d) # $18: Century part: 16-19,20-99
( 0[48] # $19: Year part: Either 04-08
| [2468][048] # or 20,24,28,40,44,48,60,64,68,80,84,88
| [13579][26] # or 12,16,32,36,52,56,72,76,92,96,
) # End $19:
| ( # or $20: Year alternative 2 (divisible by 400)
( 16 # $21: Century part: Either 16
| [2468][048] # or 20,24,28,40,44,48,60,64,68,80,84,88
| [3579][26] # or 32,36,52,56,72,76,92,96
) # End $21:
00 # Year part: 00
) # End $20:
) # End $17:
) # End $16:
) # End $1:
$ # Anchor to end of string.
%x';
为了解决您眼前的问题,这里有一个更精确的正则表达式:
$count = preg_match_all('%
# Match version/date sub-string
\b # Anchor to word boundary.
( # $1: Version number.
[Vv] # Version identifier (allow V or v).
[ ]+ # One or more spaces.
[0-9]+ # Major version number is one or more digits.
(?: # Group minor version numbers.
\. # Minor versions separated by dot.
[0-9]+ # Minor version is one or more digits.
)* # Zero or more minor versions.
) # End $1: Version number.
[ ]+ # One or more spaces.
( # $2: Date.
[0-9]{4} # Year is four digits.
/ # / Separator.
[0-9]{2} # Month is two digits.
/ # / Separator.
[0-9]{2} # Day is two digits.
) # End $2: Date.
%x', $text, $matches);
if ($count > 0) {
$versions = $matches[1];
$dates = $matches[2];
printf("Found %d matches:\n", $count);
for ($i = 0; $i < $count; ++$i) {
printf(" Match%3d: Version: %-15s Date: %s\n",
$i + 1, $versions[$i], $dates[$i]);
}
} else {
echo("No matches found.\n");
}
注意:在处理诸如此类的非平凡正则表达式时,最好使用'x'
自由间距模式编写它们。这允许添加大量的注释和缩进,使其更容易阅读。
答案 1 :(得分:0)
foreach ($lines as $line){
if (preg_match("|v (.*?) (.*?) |", $line, $match)){
echo "found version ".$match[1]." date ".$match[2];
}
}
你想要的确切吗?