我导入了一些没有列名的数据,所以现在我有超过一百万行和一列(而不是5列)。
每一行的格式如下:
x <- "2012-10-19T16:59:01-07:00 192.101.136.140 <190>Oct 19 2012 23:59:01: %FWSM-6-305011: Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874"
strsplit( x , split = c(" ", " ", "%", " "))
得到了
[[1]]
[1] "2012-10-19T16:59:01-07:00" "192.101.136.140"
[3] "<190>Oct" "19"
[5] "2012" "23:59:01:"
[7] "%FWSM-6-305011:" "Built"
[9] "dynamic" "tcp"
[11] "translation" "from"
[13] "Inside:10.2.45.62/56455" "to"
[15] "outside:192.101.136.224/9874"
我知道它与回收分裂参数有关,但我似乎无法想象如何得到它我想要的:
[[1]]
[1] "2012-10-19T16:59:01-07:00" "192.101.136.140"
[3] "<190>Oct 19 2012 23:59:01 "%FWSM-6-305011
[5] Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874"
每一行都有一个不同的消息作为第五个元素,但在第四个元素后,我只想将其余的字符串保持在一起。
任何帮助都将不胜感激。
答案 0 :(得分:2)
您可以使用paste
和collapse
参数组合从第五个元素开始的每个元素。
A <- strsplit( x = "2012-10-19T16:59:01-07:00 192.101.136.140 <190>Oct 19 2012 23:59:01: %FWSM-6-305011: Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874", split = c(" ", " ", "%", " "))
c(A[[1]][1:4], paste(A[[1]][5:length(A[[1]])], collapse=" "))
正如@DWin指出的那样,split = c(" ", " ", "%", " ")
未按顺序使用 - 换句话说,它与split = c(" ", "%")
相同
答案 1 :(得分:0)
我想在这里你不需要使用strsplit
。我使用read.table
使用text
参数读取行。然后使用paste
汇总列。由于您有很多行,因此最好在data.table
中进行列聚合。
dt <- read.table(text=x)
library(data.table)
DT <- as.data.table(dt)
DT[ , c('V3','V8') := list(paste(V3,V4,V5),
V8=paste(V8,V9,V10,V11,V12,V13,V14,V15))]
DT[,paste0('V',c(1:3,6:7,8)),with=FALSE]
V1 V2 V3 V6 V7
1: 2012-10-19T16:59:01-07:00 192.101.136.140 <190>Oct 19 2012 23:59:01: %FWSM-6-305011:
V8
1: Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874
答案 2 :(得分:0)
我认为这是一个以您认为strsplit
运行的方式工作的函数:
split.seq<-function(x,delimiters) {
break.point<-regexpr(delimiters[1], x)
first<-mapply(substring,x,1,break.point-1,USE.NAMES=FALSE)
second<-mapply(substring,x,break.point+1,nchar(x),USE.NAMES=FALSE)
if (length(delimiters)==1) return(lapply(1:length(first),function(x) c(first[x],second[x])))
else mapply(function(x,y) c(x,y),first, split.seq(second, delimiters[-1]) ,USE.NAMES=FALSE, SIMPLIFY=FALSE)
}
split.seq(x,delimiters)
测试:
x<-rep(x,2)
delimiters=c(" ", " ", "%", " ")
split.seq(x,delimiters)
[[1]]
[1] "2012-10-19T16:59:01-07:00"
[2] "192.101.136.140"
[3] "<190>Oct 19 2012 23:59:01: "
[4] "FWSM-6-305011:"
[5] "Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874"
[[2]]
[1] "2012-10-19T16:59:01-07:00"
[2] "192.101.136.140"
[3] "<190>Oct 19 2012 23:59:01: "
[4] "FWSM-6-305011:"
[5] "Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874"