我常常面对这样的矢量:
[1] "C" "" "A" "C" "D" "A" "I" "B" "H" "I" "" "C" "E"
[14] "H" "J" "J" "E" "A" "" "I" "I" "I" "G" "" "F"
我想用某种标记/指示符将向量分解为这样的向量列表:
[[1]]
[1] "C"
[[2]]
[1] "A" "C" "D" "A" "I" "B" "H" "I"
[[3]]
[1] "C" "E" "H" "J" "J" "E" "A"
[[4]]
[1] "I" "I" "I" "G"
[[5]]
[1] "F"
在这种情况下,标记是空字符串""
。我可以做到这一点,但我想知道是否有更快更有效的方法来实现这一目标。我似乎应该可以使用split
,但不能以简单的方式思考。这是我目前的做法:
## MWE
set.seed(15)
x <- sample(c("", LETTERS[1:10]), 25, TRUE, prob=c(.2, rep(.08, 10)))
locs <- which(x == "")
start <- c(1, locs + 1)
end <- c(locs - 1, length(x))
lapply(Map(":", start, end), function(ind){
x[ind]
})
答案 0 :(得分:2)
你可以这样做。一,测试数据
a<-c("C","","A","C","D","A","I","B",
"H","I","","C","E","H","J","J",
"E","A","","I","I","I","G","","F")
现在我们找到所有的sentinal值
breaks <- a==""
现在我们使用split并在遇到休息时将每个值分配给一个新列表
split(a[!breaks], cumsum(breaks)[!breaks])
然后返回
$`0`
[1] "C"
$`1`
[1] "A" "C" "D" "A" "I" "B" "H" "I"
$`2`
[1] "C" "E" "H" "J" "J" "E" "A"
$`3`
[1] "I" "I" "I" "G"
$`4`
[1] "F"
根据需要。
因为我们经常在breaks
中使用split
的值,所以通常很难将其写为单行。这就是为什么我喜欢使用一个名为withX()的辅助函数,我会像
withX(a=="", split(a[!X], cumsum(X)[!X]))
答案 1 :(得分:2)
这是另一种方法
tapply(x,cumsum(!nchar(x)), function(x) if(length(x)>1) tail(x,-1L) else x)