我有一个格式列表:
[[1]]
[1] "10" "719" "99"
[[2]]
[1] "10" "624" "85" "888" "624"
[[3]]
[1] "1" "894" "110" "344" "634"
我想通过列表中第一个元素的唯一值进行合并,即。
[[1]]
[1] "10" "719" "99" "624" "85" "888" "624"
[[2]]
[1] "1" "894" "110" "344" "634"
使用最少的内存有没有办法做到这一点?
答案 0 :(得分:2)
我会按如下方式处理:
x <- list(c("10", "719", "99"),
c("10", "624", "85", "888", "624"),
c("1", "894", "110", "344", "634"))
first_elems <- sapply(x, "[", 1) # get 1st elem of each vector
(first_elems <- as.factor(first_elems)) # factorize (i.a. find all unique elems)
## [1] 10 10 1
## Levels: 1 10
(group <- split(x, first_elems)) # split by 1st elem (divide into groups)
## $`1`
## $`1`[[1]]
## [1] "1" "894" "110" "344" "634"
##
##
## $`10`
## $`10`[[1]]
## [1] "10" "719" "99"
##
## $`10`[[2]]
## [1] "10" "624" "85" "888" "624"
##
(result <- lapply(group, unlist)) # combine vectors in each group (list of vectors -> an atomic vector)
## $`1`
## [1] "1" "894" "110" "344" "634"
##
## $`10`
## [1] "10" "719" "99" "10" "624" "85" "888" "624"
编辑:对于非重复密钥,请使用:
(result <- lapply(group, function(x) {
c(x[[1]][1], unlist(lapply(x, "[", -1)))
}))
## $`1`
## [1] "1" "894" "110" "344" "634"
##
## $`10`
## [1] "10" "719" "99" "624" "85" "888" "624"
不需要额外的内存。除了结果列表,我们需要存储as.factor
的结果(类的数量+ x
中的元素数)。 <{1}}只需要额外的内存 - split
中的向量不会被深层复制。
至于性能,对于相当大的列表:
x
我在旧的Linux笔记本电脑上运行了以下时间:
set.seed(1L)
n <- 100000
x <- vector('list', n)
for (i in 1:n)
x[[i]] <- as.character(sample(1:1000, ceiling(runif(1, 1, 1000)), replace=TRUE))
object.size(x) # 2GB
## 2175165880 bytes
我认为似乎是合理的。
答案 1 :(得分:0)
我不确定速度,但这是一个for
循环方法(我不经常使用),这种方法说明了需要对列表进行操作的方法。
x <- list(c("10", "719", "99"),
c("10", "624", "85" , "888", "624"),
c("1", "894", "110", "344", "634"))
y <- vector('list', length(x)) # allocate a list at least as long as x
for(i in 2:length(x)){
if((x[[i-1]] %in% x[[i]])[1]){
y[[i-1]] <- c(unlist(x[[i-1]]), unlist(x[[i]][-1]))
} else {
y[[i-1]] <- x[[i]]
}
}
z <- y[!sapply(y, is.null)]
z
# [[1]]
# [1] "10" "719" "99" "624" "85" "888" "624"
#
# [[2]]
# [1] "1" "894" "110" "344" "634"