我想将日志事件加载到data.table
,每个日志都由timestamp
标识,而某些日志可能包含多行。
我有以下.txt
文件:
2016-07-19 00:00:01,421 WARNING Exception happened while transfering for command
at java.lang.NumberFormatException
at java.lang.Integer.parseInt
at java.util.concurrent.Task
2016-07-19 00:01:01,525 DEBUG Upload all environments
2016-07-19 00:01:01,720 DEBUG Upload all environments
2016-07-19 00:02:00,520 WARNING Excpetion happened while transfering for command
at java.lang.NumberFormatException
我想获得以下data.table
:
log
1 2016-07-19 00:00:01,421 WARNING Exception happened while transfering for command at java.lang.NumberFormatException at java.lang.Integer.parseInt at java.util.concurrent.Task
2 2016-07-19 00:01:01,525 DEBUG Upload all environments
3 2016-07-19 00:01:01,720 DEBUG Upload all environments
4 2016-07-19 00:02:00,520 WARNING Excpetion happened while transfering for command at java.lang.NumberFormatException
我想将每个日志事件上传到一行。我尝试使用\n
分隔符:
docs <- read.table("log2.txt",header=FALSE,sep="\n",col.names="log",nrows=1000)
答案 0 :(得分:0)
使用readLines
,然后合并data.table
中的行:
require(data.table)
raw = data.table(s = readLines('log.txt'))
raw = raw[s != '']
raw[, s := stringr::str_trim(s)]
raw[, idx := cumsum(s %like% '^[0-9]{4}')]
raw[, list(s = paste(s, collapse = ' ')), by = idx]
编辑:改变年度正则表达式,感谢您的评论