将具有特定分隔符的文本导入r

时间:2016-07-26 11:30:56

标签: r

我想将日志事件加载到data.table,每个日志都由timestamp标识,而某些日志可能包含多行。

我有以下.txt文件:

2016-07-19 00:00:01,421 WARNING Exception happened while transfering for command
                                at java.lang.NumberFormatException
                                at java.lang.Integer.parseInt
                                at java.util.concurrent.Task

2016-07-19 00:01:01,525 DEBUG Upload all environments
2016-07-19 00:01:01,720 DEBUG Upload all environments
2016-07-19 00:02:00,520 WARNING Excpetion happened while transfering for command
                                at java.lang.NumberFormatException

我想获得以下data.table

      log
1 2016-07-19 00:00:01,421 WARNING Exception happened while transfering for command at java.lang.NumberFormatException at java.lang.Integer.parseInt at java.util.concurrent.Task  
2 2016-07-19 00:01:01,525 DEBUG Upload all environments
3 2016-07-19 00:01:01,720 DEBUG Upload all environments
4 2016-07-19 00:02:00,520 WARNING Excpetion happened while transfering for command at java.lang.NumberFormatException

我想将每个日志事件上传到一行。我尝试使用\n分隔符:

docs <- read.table("log2.txt",header=FALSE,sep="\n",col.names="log",nrows=1000)

1 个答案:

答案 0 :(得分:0)

使用readLines,然后合并data.table中的行:

require(data.table)

raw = data.table(s = readLines('log.txt'))
raw = raw[s != '']
raw[, s := stringr::str_trim(s)]
raw[, idx := cumsum(s %like% '^[0-9]{4}')]
raw[, list(s = paste(s, collapse = ' ')), by = idx]

编辑:改变年度正则表达式,感谢您的评论