my $genlog_line_1= qr{
\A
(?:(\d{6}\s+\d{1,2}:\d\d:\d\d|\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+(?:Z|-?\d\d:\d\d)?))? # Timestamp
\s+
(?:\s*(\d+)) # Thread ID
\s
(\w+) # Command
\s+
(.*) # Argument
\Z
}xs;
my $line = "2018-12-14T17:32:52.236100+08:00 477637459 Query SELECT dv.mandatory,dv.optional FROM dbversion dv";
my ($ts, $thread_id, $cmd, $arg) = $line =~ m/$genlog_line_1/;
print $ts, $thread_id, $cmd, $arg;
为什么正则表达式不匹配?我期望的是:
Timestamp 2018-12-14T17:32:52.236100
thread_id 477637459
cmd Query
arg SELECT dv.mandatory,dv.optional FROM dbversion dv
答案 0 :(得分:4)
您在输入中输入了+08:00
,但是在-?
中的(?:Z|-?\d\d:\d\d)?
仅说明了一个负值或无符号的值。
因此,在第一个正则表达式行上,应将-?
替换为[+-]?
,以匹配可选的-
或 +
。另外,由于+08:00
部分不应该属于组1,因此我建议使用分支重置组 (?|...|...)
将组内的不同部分捕获到同一组中,第1组:
(?|(\d{6}\s+\d{1,2}:\d\d:\d\d)|(\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+)(?:Z|[-+]?\d\d:\d\d)?)?
^^^ ^ ^ ^ ^^^^
固定模式:
my $genlog_line_1= qr{
\A
(?|(\d{6}\s+\d{1,2}:\d\d:\d\d)|(\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+)(?:Z|[-+]?\d\d:\d\d)?)? # Timestamp
\s+
(?:\s*(\d+)) # Thread ID
\s
(\w+) # Command
\s+
(.*) # Argument
\Z
}xs;
请参见regex demo。
请注意,如果输入中始终存在TIMESTAMP,则在分支复位组之后的?
可能不是必需的。
答案 1 :(得分:0)
您的正则表达式的主要问题是它没有考虑到+08:00
中存在的$line
。
将其更改为:
\A(?:(\d{6}\s+\d{1,2}:\d\d:\d\d|\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+(?:Z|-?\d\d:\d\d)?))?(?:\+\d\d:\d\d)?\s+(?:\s*(\d+))\s+(\w+)\s+(.*)\Z
演示: