融合向量中的一些信息

时间:2014-12-17 10:50:33

标签: r

有些东西可能很明显,但我似乎无法看到它:

我有这样的矢量:

vec<-c("i: 1","n: alpha","a: term1","a: term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4","a: term5","a: term6")

我需要得到这个:

out<-c("i: 1","n: alpha","a: term1;term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4;term5;term6")

也就是说,对于每个唯一的i:,如果有多个a:,则将其融合。

我尝试使用diffrle,但结果代码(见下文)太长了,我认为我无用地解决问题...

我的代码:

out<-vec
a<-which(grepl("^a: ",vec))
diffa<-diff(a)
diffa1<-which(diffa==1)
rle_a<-rle(diffa)$lengths[rle(diffa)$values==1]
indwh<-1
for(ind in 1:length(rle_a)){
    allindwh<-indwh:(indwh+rle_a[ind]-1)
    out[a[c(diffa1[allindwh],diffa1[allindwh[length(allindwh)]]+1)]]<-paste(out[a[diffa1[allindwh[1]]]],paste(gsub("a: ","",out[a[c(diffa1[allindwh[-1]],diffa1[allindwh[length(allindwh)]]+1)]]),collapse=";"),sep=";")
    indwh<-indwh+rle_a[ind]
}
out<-unique(out)

所以我得到了我想要的东西,但我真的很感激任何简化它的提示。

1 个答案:

答案 0 :(得分:4)

使用tapply

,这是一种更简单的方法
# index of 'a's
idx <- grepl("^a", vec)
# find groups
grp <- c(0, cumsum(diff(idx) < 0))
# apply function to vector based on groups
unlist(tapply(vec, grp, FUN = function(x) 
        c(x[1:2], paste("a:", paste(sub("^a:\\s*", "", x[-(1:2)]), collapse = ";")))),
       use.names = FALSE)

# [1] "i: 1"                 "n: alpha"             "a: term1;term2"      
# [4] "i: 2"                 "n: beta"              "a: term3"            
# [7] "i: 3"                 "n: gamma"             "a: term4;term5;term6"