根据矢量中的标记制作一个列表

时间:2014-06-20 02:41:25

标签: r

我常常面对这样的矢量:

 [1] "C" ""  "A" "C" "D" "A" "I" "B" "H" "I" ""  "C" "E"
[14] "H" "J" "J" "E" "A" ""  "I" "I" "I" "G" ""  "F"

我想用某种标记/指示符将向量分解为这样的向量列表:

[[1]]
[1] "C"

[[2]]
[1] "A" "C" "D" "A" "I" "B" "H" "I"

[[3]]
[1] "C" "E" "H" "J" "J" "E" "A"

[[4]]
[1] "I" "I" "I" "G"

[[5]]
[1] "F"

在这种情况下,标记是空字符串""。我可以做到这一点,但我想知道是否有更快更有效的方法来实现这一目标。我似乎应该可以使用split,但不能以简单的方式思考。这是我目前的做法:

## MWE
set.seed(15)
x <- sample(c("", LETTERS[1:10]), 25, TRUE, prob=c(.2, rep(.08, 10)))

locs <- which(x == "")
start <- c(1, locs + 1)
end <- c(locs - 1, length(x))

lapply(Map(":", start, end), function(ind){
    x[ind]
})

2 个答案:

答案 0 :(得分:2)

你可以这样做。一,测试数据

a<-c("C","","A","C","D","A","I","B",
    "H","I","","C","E","H","J","J",
    "E","A","","I","I","I","G","","F")

现在我们找到所有的sentinal值

breaks <- a==""

现在我们使用split并在遇到休息时将每个值分配给一个新列表

split(a[!breaks], cumsum(breaks)[!breaks])

然后返回

$`0`
[1] "C"

$`1`
[1] "A" "C" "D" "A" "I" "B" "H" "I"

$`2`
[1] "C" "E" "H" "J" "J" "E" "A"

$`3`
[1] "I" "I" "I" "G"

$`4`
[1] "F"

根据需要。

因为我们经常在breaks中使用split的值,所以通常很难将其写为单行。这就是为什么我喜欢使用一个名为withX()的辅助函数,我会像

一样使用它
withX(a=="", split(a[!X], cumsum(X)[!X]))

答案 1 :(得分:2)

这是另一种方法

 tapply(x,cumsum(!nchar(x)), function(x) if(length(x)>1) tail(x,-1L) else x)