组合R中列表的行

时间:2014-05-11 20:28:09

标签: r list

我有一个格式列表:

[[1]]
 [1] "10"  "719" "99"  

[[2]]
 [1] "10"  "624" "85"  "888" "624" 

[[3]]
 [1] "1"   "894" "110" "344" "634"  

我想通过列表中第一个元素的唯一值进行合并,即。

[[1]]
 [1] "10"  "719" "99" "624" "85"  "888" "624" 

[[2]]
 [1] "1"   "894" "110" "344" "634"

使用最少的内存有没有办法做到这一点?

2 个答案:

答案 0 :(得分:2)

我会按如下方式处理:

x <- list(c("10",  "719", "99"),
          c("10",  "624", "85",  "888", "624"),
          c("1",   "894", "110", "344", "634"))
first_elems <- sapply(x, "[", 1) # get 1st elem of each vector
(first_elems <- as.factor(first_elems)) # factorize (i.a. find all unique elems)
## [1] 10 10 1 
## Levels: 1 10
(group <- split(x, first_elems)) # split by 1st elem (divide into groups)
## $`1`
## $`1`[[1]]
## [1] "1"   "894" "110" "344" "634"
## 
## 
## $`10`
## $`10`[[1]]
## [1] "10"  "719" "99" 
## 
## $`10`[[2]]
## [1] "10"  "624" "85"  "888" "624"
## 
(result <- lapply(group, unlist)) # combine vectors in each group (list of vectors -> an atomic vector)
## $`1`
## [1] "1"   "894" "110" "344" "634"
## 
## $`10`
## [1] "10"  "719" "99"  "10"  "624" "85"  "888" "624"

编辑:对于非重复密钥,请使用:

(result <- lapply(group, function(x) {
      c(x[[1]][1], unlist(lapply(x, "[", -1)))
   }))
## $`1`
## [1] "1"   "894" "110" "344" "634"
## 
## $`10`
## [1] "10"  "719" "99"  "624" "85"  "888" "624"

不需要额外的内存。除了结果列表,我们需要存储as.factor的结果(类的数量+ x中的元素数)。 <{1}}只需要额外的内存 - split中的向量不会被深层复制。

至于性能,对于相当大的列表:

x

我在旧的Linux笔记本电脑上运行了以下时间:

set.seed(1L)
n <- 100000
x <- vector('list', n)
for (i in 1:n)
   x[[i]] <- as.character(sample(1:1000, ceiling(runif(1, 1, 1000)), replace=TRUE))
object.size(x) # 2GB
## 2175165880 bytes

我认为似乎是合理的。

答案 1 :(得分:0)

我不确定速度,但这是一个for循环方法(我不经常使用),这种方法说明了需要对列表进行操作的方法。

x <- list(c("10",  "719", "99"),
          c("10",  "624", "85" , "888", "624"),
          c("1",   "894", "110", "344", "634"))  

y <- vector('list', length(x)) # allocate a list at least as long as x

for(i in 2:length(x)){
  if((x[[i-1]] %in% x[[i]])[1]){
    y[[i-1]] <- c(unlist(x[[i-1]]), unlist(x[[i]][-1]))
  } else {
    y[[i-1]] <- x[[i]]
  }
}

z <- y[!sapply(y, is.null)]
z
# [[1]]
# [1] "10"  "719" "99"  "624" "85"  "888" "624"
# 
# [[2]]
# [1] "1"   "894" "110" "344" "634"