我正在尝试解析系统日志行:
library(dplyr)
library(tidyr)
df %>%
# First create a new variable containing the month as a numeric between 1-12
mutate(month = strftime(date, "%m")) %>%
# Make data tidy so basically there is new column col containing
# month.1, month.2, month.3, ... and a column val containg
# the values
gather(col, val, -date, -month) %>%
# remove "month.m" so the col column has numeric values
mutate_at("col", str_replace, pattern = "month.m", replacement = "") %>%
mutate_at(c("month", "col"), as.numeric) %>%
# Compute the difference between the month column and the col column
mutate(col = abs((col - month + 1) %% 12)) %>%
# Sort the dataframe according to the new col column
arrange(month, col) %>%
# Add month.m to the col column so we redefine the names of the columns
mutate(col = paste0("month.m", col), month = NULL) %>%
# Untidy the data frame
spread(col, val)
我的目标是将这些数据分解为键/值对。它需要是perl regex(这恰好是针对solaris日志进入Splunk,以防有人对它的用途感到好奇)。
到目前为止,我有这个:
pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
它可以很好地提取我的数据,但是只要一个单词以冒号结尾,它就会包含在第一组中。
预期结果:
[\>\:]*\s+(.*?)\<(.+?)\>
实际结果(注意冒号)
Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob
http://regexr.com/代码的链接: http://regexr.com/3fasr 很多反复试验让我到了这个位置 - 我只是想弄清楚如何取出最后一段标点符号。
答案 0 :(得分:0)