如何对作为数据框的列表元素进行简单操作?

时间:2014-05-27 12:02:47

标签: r dataframe

我有列表,其中每个元素都是数据帧。列表的每个元素中的数据帧的维度(行数)彼此不同,如下所示:

for(i in 1:length(nm)){print(dim(ismr2[[i]]))}
[1] 510   3
[1] 477   3
[1] 412   3
[1] 422   3
[1] 455   3
[1] 398   3
[1] 405   3
[1] 407   3
[1] 452   3
[1] 462   3
[1] 498   3
[1] 495   3
[1] 469   3
[1] 470   3

但是有一条规则,所有行名都是具有最大行数的数据帧的行名的子集(例如510)。

我的目标是找到行号最大的数据帧(让我们称之为参考数据帧),并将其他数据帧添加到缺少的行中(在参考数据帧中的行,但不是其他数据框)。

预期产出:

1)一个列表,其中每个元素都是数据框

2)作为数据框的列表的所有元素具有相同的维度。维度应该等于参考数据框的维数。(参考数据框是具有最大行数的列表的元素)

3)新添加到数据框中的行具有相同的维度,行的名称应该在参考数据框中并且有0个条目。

这是我的努力,但它不起作用:

isomir2  # original list

ismr3 <- vector("list", length(isomir2))

# find the refrence data frame: it give me just the data fram with larget dimension, I don't know which data frame is it (which element of list) 
length.max <- max(unlist(lapply(isomir2, function(x) nrow(x))))

for (i in 1:length(isomir2)){
  ismr3[[i]] <- rbind(isomir2[[i]],matrix(0,ncol=3,nrow=length.max - nrow(isomir2[[i]]))
                      temp <- rownames(isomir2[[i]])
                      rownames(P[[i]]) <- c(temp, # How should I find the missing row name here ? ))
}

有人会帮我实现这个吗?

简单输入:

> P
[[1]]
  [,1]
A    1
B    2
C    3
D    4

[[2]]
  [,1]
A    1
B    2
D    3

[[3]]
  [,1]
B    1
C    2

预期产出:

> P
[[1]]
  [,1]
A    1
B    2
C    3
D    4

[[2]]
  [,1]
A    1
B    2
D    3
C    0

[[3]]
  [,1]
B    1
C    2
D    0
A    0

2 个答案:

答案 0 :(得分:1)

首先,我在您的数据所在的表单上生成一些非感知数据(因为您没有提供任何示例数据集来处理)

ismr2 <- lapply(2*1:5, function(i){
    d <- data.frame(rnorm(i), runif(i))
    row.names(d) <- sample(LETTERS[1:i])
    d
})

然后我ismr2这样的数据框归为

ref <- ismr2[[which.max(sapply(ismr2, nrow))]]

ismr3 <- lapply(ismr2, function(x){
    rbind(x, ref[!rownames(ref) %in% rownames(x),])
})

现在ismr3的所有帧都具有相同的行数,如下所示:

> sapply(ismr3, row.names)
      [,1] [,2] [,3] [,4] [,5]
 [1,] "B"  "D"  "B"  "F"  "F" 
 [2,] "A"  "B"  "D"  "G"  "C" 
 [3,] "F"  "A"  "F"  "D"  "H" 
 [4,] "C"  "C"  "A"  "E"  "D" 
 [5,] "H"  "F"  "E"  "A"  "E" 
 [6,] "D"  "H"  "C"  "B"  "B" 
 [7,] "E"  "E"  "H"  "C"  "A" 
 [8,] "I"  "I"  "I"  "H"  "I" 
 [9,] "J"  "J"  "J"  "I"  "J" 
[10,] "G"  "G"  "G"  "J"  "G" 

如果您不想反驳(即从反思框架中删除缺失的行),而是让它们为0(或者可能更恰当NA?),您可以

lapply(ismr2, function(x){
    rn <- union(rownames(x), rownames(ref))
    x <- x[rn,]
    x[is.na(x)] <- 0  # Remove this line to let missing rows be NA
    rownames(x) <- rn
    x
})

答案 1 :(得分:0)

# if you dont care about row names
# this works for data.frames with multiple columns
# this does not add empty columns
l=list()
l[[1]]=data.frame(1:4) # note that double [[]] are important else it will be stored as a         vector or list in list instead of data.frame in list
l[[2]]=data.frame(1:3)
l[[3]]=data.frame(1:2)
l[[4]]=data.frame(1:4)

# check biggest
size=0
for(i in 1:length(l)){
  if (dim(l[[i]])[1]>size) {
    size=dim(l[[i]])[1]
  }
}

# add empty rows
emptyValue=0
for(i in 1:length(l)){
  if(dim(l[[i]])[1]<size){
    l[[i]][dim(l[[i]])[1]:size,]=emptyValue
  }
}
l

通过改变“lapply”中的“for”来提高速度