正则表达式从一个组的末尾排除了一个charachter

时间:2017-02-17 16:46:29

标签: regex regex-group

我正在尝试解析系统日志行:

library(dplyr)
library(tidyr)
df %>% 
  # First create a new variable containing the month as a numeric between 1-12
  mutate(month = strftime(date, "%m")) %>% 
  # Make data tidy so basically there is new column col containing
  # month.1, month.2, month.3, ... and a column val containg
  # the values
  gather(col, val, -date, -month) %>% 
  # remove "month.m" so the col column has numeric values
  mutate_at("col", str_replace, pattern = "month.m", replacement = "") %>%
  mutate_at(c("month", "col"), as.numeric) %>% 
  # Compute the difference between the month column and the col column
  mutate(col = abs((col - month + 1) %% 12)) %>% 
  # Sort the dataframe according to the new col column
  arrange(month, col) %>% 
  # Add month.m to the col column so we redefine the names of the columns
  mutate(col = paste0("month.m", col), month = NULL) %>% 
  # Untidy the data frame
  spread(col, val)

我的目标是将这些数据分解为键/值对。它需要是perl regex(这恰好是针对solaris日志进入Splunk,以防有人对它的用途感到好奇)。

到目前为止,我有这个:

pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

它可以很好地提取我的数据,但是只要一个单词以冒号结尾,它就会包含在第一组中。

预期结果:

[\>\:]*\s+(.*?)\<(.+?)\>

实际结果(注意冒号)

Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob

http://regexr.com/代码的链接: http://regexr.com/3fasr 很多反复试验让我到了这个位置 - 我只是想弄清楚如何取出最后一段标点符号。

1 个答案:

答案 0 :(得分:0)

这个正则表达式似乎适合你:

[\>\:]*\s+(.*?)\:?\s\<(.+?)\> 

正如你在这里看到的: http://regexr.com/3fatg

IOR

Regular expression visualization

Debuggex Demo