从日志行中提取数字

时间:2016-01-11 06:26:46

标签: regex powershell

我正在尝试使用Powershell中的Regex从实时日志文件中提取数字。我的正则表达式代码的工作原理是它只返回字母A左边的数字,但由于某种原因它返回整行而不是孤立的数字。

我正在尝试转换日志文件:

1/11/2016 3:26:12 PM 1/11/2016 3:27:00 PM 86.4 A
1/11/2016 3:26:12 PM 1/11/2016 3:28:00 PM 86.3 A
1/11/2016 3:26:12 PM 1/11/2016 3:29:00 PM 86.8 A
1/11/2016 3:26:12 PM 1/11/2016 3:29:16 PM 86.7 A

致:

86.4
86.3
86.8
86.7

到目前为止,这是我的代码:

  border: 0;
  clip: rect(0 0 0 0);
  height: 1px;
  margin: -1px;
  overflow: hidden;
  padding: 0;
  position: absolute;
  width: 1px;

1 个答案:

答案 0 :(得分:1)

正则表达式本身有点古怪.*\d\s+A意味着:“任何事情都会发生,然后是一个数字,然后是至少一个空格,最后是字母A”。这涵盖了比您感兴趣的更多案例。例如,它将匹配仅包含四个字符的行,例如“94.9 A”。

根据日志文件结构和误报,更严格的方法和/或分组是有帮助的。像这样,(?:PM\s+)(\d+\.\d+)(?:\s+A)

(?:PM\s+)   := match letters PM followed with at least one whitespace
(\d+\.\d+)  := match at least one digit followed by dot and at least one digit
(?:\s+A)    := match at least one whitespace followed by letter A

举个例子,

[regex]$regex = '(?:PM\s+)(\d+\.\d+)(?:\s+A)'

$s = @("1/11/2016 3:26:12 PM 1/11/2016 3:27:00 PM 86.4 A",
"1/11/2016 3:26:12 PM 1/11/2016 3:28:00 PM 86.3 A",
"1/11/2016 3:26:12 PM 1/11/2016 3:29:00 PM 86.8 A",
"1/11/2016 3:26:12 PM 1/11/2016 3:29:16 PM 86.7 A",
"foobarline shouldn't match",
"94.9 A",
"PM 84.8 A")

# Note that the two invalid rows are skipped
$s | % { $regex.Matches($_) | % {$_.groups[1].value} }
86.4
86.3
86.8
86.7
84.8