Question

我有列表，其中每个元素都是数据帧。列表的每个元素中的数据帧的维度（行数）彼此不同，如下所示：

for(i in 1:length(nm)){print(dim(ismr2[[i]]))}
[1] 510   3
[1] 477   3
[1] 412   3
[1] 422   3
[1] 455   3
[1] 398   3
[1] 405   3
[1] 407   3
[1] 452   3
[1] 462   3
[1] 498   3
[1] 495   3
[1] 469   3
[1] 470   3

但是有一条规则，所有行名都是具有最大行数的数据帧的行名的子集（例如510）。

我的目标是找到行号最大的数据帧（让我们称之为参考数据帧），并将其他数据帧添加到缺少的行中（在参考数据帧中的行，但不是其他数据框）。

预期产出：

1）一个列表，其中每个元素都是数据框

2）作为数据框的列表的所有元素具有相同的维度。维度应该等于参考数据框的维数。（参考数据框是具有最大行数的列表的元素）

3）新添加到数据框中的行具有相同的维度，行的名称应该在参考数据框中并且有0个条目。

这是我的努力，但它不起作用：

isomir2  # original list

ismr3 <- vector("list", length(isomir2))

# find the refrence data frame: it give me just the data fram with larget dimension, I don't know which data frame is it (which element of list) 
length.max <- max(unlist(lapply(isomir2, function(x) nrow(x))))

for (i in 1:length(isomir2)){
  ismr3[[i]] <- rbind(isomir2[[i]],matrix(0,ncol=3,nrow=length.max - nrow(isomir2[[i]]))
                      temp <- rownames(isomir2[[i]])
                      rownames(P[[i]]) <- c(temp, # How should I find the missing row name here ? ))
}

有人会帮我实现这个吗？

简单输入：

> P
[[1]]
  [,1]
A    1
B    2
C    3
D    4

[[2]]
  [,1]
A    1
B    2
D    3

[[3]]
  [,1]
B    1
C    2

预期产出：

> P
[[1]]
  [,1]
A    1
B    2
C    3
D    4

[[2]]
  [,1]
A    1
B    2
D    3
C    0

[[3]]
  [,1]
B    1
C    2
D    0
A    0

Answer 1

首先，我在您的数据所在的表单上生成一些非感知数据（因为您没有提供任何示例数据集来处理）

ismr2 <- lapply(2*1:5, function(i){
    d <- data.frame(rnorm(i), runif(i))
    row.names(d) <- sample(LETTERS[1:i])
    d
})

然后我将ismr2这样的数据框归为

ref <- ismr2[[which.max(sapply(ismr2, nrow))]]

ismr3 <- lapply(ismr2, function(x){
    rbind(x, ref[!rownames(ref) %in% rownames(x),])
})

现在ismr3的所有帧都具有相同的行数，如下所示：

> sapply(ismr3, row.names)
      [,1] [,2] [,3] [,4] [,5]
 [1,] "B"  "D"  "B"  "F"  "F" 
 [2,] "A"  "B"  "D"  "G"  "C" 
 [3,] "F"  "A"  "F"  "D"  "H" 
 [4,] "C"  "C"  "A"  "E"  "D" 
 [5,] "H"  "F"  "E"  "A"  "E" 
 [6,] "D"  "H"  "C"  "B"  "B" 
 [7,] "E"  "E"  "H"  "C"  "A" 
 [8,] "I"  "I"  "I"  "H"  "I" 
 [9,] "J"  "J"  "J"  "I"  "J" 
[10,] "G"  "G"  "G"  "J"  "G"

如果您不想反驳（即从反思框架中删除缺失的行），而是让它们为0（或者可能更恰当NA？），您可以

lapply(ismr2, function(x){
    rn <- union(rownames(x), rownames(ref))
    x <- x[rn,]
    x[is.na(x)] <- 0  # Remove this line to let missing rows be NA
    rownames(x) <- rn
    x
})

Answer 2

# if you dont care about row names
# this works for data.frames with multiple columns
# this does not add empty columns
l=list()
l[[1]]=data.frame(1:4) # note that double [[]] are important else it will be stored as a         vector or list in list instead of data.frame in list
l[[2]]=data.frame(1:3)
l[[3]]=data.frame(1:2)
l[[4]]=data.frame(1:4)

# check biggest
size=0
for(i in 1:length(l)){
  if (dim(l[[i]])[1]>size) {
    size=dim(l[[i]])[1]
  }
}

# add empty rows
emptyValue=0
for(i in 1:length(l)){
  if(dim(l[[i]])[1]<size){
    l[[i]][dim(l[[i]])[1]:size,]=emptyValue
  }
}
l

通过改变“lapply”中的“for”来提高速度

如何对作为数据框的列表元素进行简单操作？

2 个答案: