假设我们有一个向量:
x <- c(1,1,1,2,2,2,2,2,4,4,2,2,2,2)
什么是可以x
并返回l
的函数,其中l
等于
[[1]]
[1] 1 1 1
[[2]]
[1] 2 2 2 2 2
[[3]]
[1] 4 4
[[4]]
[1] 2 2 2 2
答案 0 :(得分:6)
使用rle
,rep
和split
:
a <- rle(x)
split(x, rep(seq_along(a$lengths), a$lengths))
# $`1`
# [1] 1 1 1
#
# $`2`
# [1] 2 2 2 2 2
#
# $`3`
# [1] 4 4
#
# $`4`
# [1] 2 2 2 2
在此,rle
计算输入向量的“运行长度”。结果是list
lengths
和values
。我们只需要lengths
,我们可以从中创建一个“分组”变量,我们可以split
原始矢量。
我没有对while
循环进行基准测试,因为使用这个长向量需要很长时间才能完成。
library(microbenchmark)
set.seed(1)
x <- sample(1:5, 1e5, replace = TRUE)
fun1 <- function() {
a <- rle(x)
split(x, rep(seq_along(a$lengths), a$lengths))
}
fun2 <- function() {
splits = which(diff(x) != 0)
split.locs = rbind(c(1, splits+1), c(splits, length(x)))
apply(split.locs, 2, function(y) x[y[1]:y[2]])
}
fun3 <- function() split(x, c(0, cumsum(as.logical(diff(x)))))
microbenchmark(fun1(), fun2(), fun3(), times = 20)
# Unit: milliseconds
# expr min lq median uq max neval
# fun1() 142.0386 147.7061 154.2853 158.0239 196.4665 20
# fun2() 363.5707 386.0575 423.1791 444.4695 543.9427 20
# fun3() 305.5331 316.0356 320.5203 329.7177 376.3236 20
答案 1 :(得分:3)
另一种可能性:
split(x, c(0, cumsum(as.logical(diff(x)))))
答案 2 :(得分:1)
这是一种不同的方法,它依赖于diff
和apply
而不是while循环:
x <- c(1,1,1,2,2,2,2,2,4,4,2,2,2,2)
# Indices of ends of continuous regions (diff helps us find where neighboring elements differ)
splits = which(diff(x) != 0)
# Columns are ranges of continuous regions
split.locs = rbind(c(1, splits+1), c(splits, length(x)))
# Split based on ranges
apply(split.locs, 2, function(y) x[y[1]:y[2]])
# [[1]]
# [1] 1 1 1
# [[2]]
# [1] 2 2 2 2 2
# [[3]]
# [1] 4 4
# [[4]]
# [1] 2 2 2 2
答案 3 :(得分:-1)
这是一个黑客
pure <- x
out <- list()
while(length(pure) > 0) {
matches <- which(pure==pure[1])
matches2 <- list()
matches2[[1]] <- matches[1]
for(i in 2:length(matches)) {
if(matches[i] - matches[i-1] > 1) {
break;
}
matches2[[i]] <- matches[i]
}
matches2 <- unlist(matches2)
out[[length(out) + 1]] <- pure[matches2]
pure <- pure[-matches2]
}
out