我有列表,其中每个元素都是数据帧。列表的每个元素中的数据帧的维度(行数)彼此不同,如下所示:
for(i in 1:length(nm)){print(dim(ismr2[[i]]))}
[1] 510 3
[1] 477 3
[1] 412 3
[1] 422 3
[1] 455 3
[1] 398 3
[1] 405 3
[1] 407 3
[1] 452 3
[1] 462 3
[1] 498 3
[1] 495 3
[1] 469 3
[1] 470 3
但是有一条规则,所有行名都是具有最大行数的数据帧的行名的子集(例如510)。
我的目标是找到行号最大的数据帧(让我们称之为参考数据帧),并将其他数据帧添加到缺少的行中(在参考数据帧中的行,但不是其他数据框)。
预期产出:
1)一个列表,其中每个元素都是数据框
2)作为数据框的列表的所有元素具有相同的维度。维度应该等于参考数据框的维数。(参考数据框是具有最大行数的列表的元素)
3)新添加到数据框中的行具有相同的维度,行的名称应该在参考数据框中并且有0个条目。
这是我的努力,但它不起作用:
isomir2 # original list
ismr3 <- vector("list", length(isomir2))
# find the refrence data frame: it give me just the data fram with larget dimension, I don't know which data frame is it (which element of list)
length.max <- max(unlist(lapply(isomir2, function(x) nrow(x))))
for (i in 1:length(isomir2)){
ismr3[[i]] <- rbind(isomir2[[i]],matrix(0,ncol=3,nrow=length.max - nrow(isomir2[[i]]))
temp <- rownames(isomir2[[i]])
rownames(P[[i]]) <- c(temp, # How should I find the missing row name here ? ))
}
有人会帮我实现这个吗?
简单输入:
> P
[[1]]
[,1]
A 1
B 2
C 3
D 4
[[2]]
[,1]
A 1
B 2
D 3
[[3]]
[,1]
B 1
C 2
预期产出:
> P
[[1]]
[,1]
A 1
B 2
C 3
D 4
[[2]]
[,1]
A 1
B 2
D 3
C 0
[[3]]
[,1]
B 1
C 2
D 0
A 0
答案 0 :(得分:1)
首先,我在您的数据所在的表单上生成一些非感知数据(因为您没有提供任何示例数据集来处理)
ismr2 <- lapply(2*1:5, function(i){
d <- data.frame(rnorm(i), runif(i))
row.names(d) <- sample(LETTERS[1:i])
d
})
然后我将ismr2
这样的数据框归为
ref <- ismr2[[which.max(sapply(ismr2, nrow))]]
ismr3 <- lapply(ismr2, function(x){
rbind(x, ref[!rownames(ref) %in% rownames(x),])
})
现在ismr3
的所有帧都具有相同的行数,如下所示:
> sapply(ismr3, row.names)
[,1] [,2] [,3] [,4] [,5]
[1,] "B" "D" "B" "F" "F"
[2,] "A" "B" "D" "G" "C"
[3,] "F" "A" "F" "D" "H"
[4,] "C" "C" "A" "E" "D"
[5,] "H" "F" "E" "A" "E"
[6,] "D" "H" "C" "B" "B"
[7,] "E" "E" "H" "C" "A"
[8,] "I" "I" "I" "H" "I"
[9,] "J" "J" "J" "I" "J"
[10,] "G" "G" "G" "J" "G"
如果您不想反驳(即从反思框架中删除缺失的行),而是让它们为0
(或者可能更恰当NA
?),您可以
lapply(ismr2, function(x){
rn <- union(rownames(x), rownames(ref))
x <- x[rn,]
x[is.na(x)] <- 0 # Remove this line to let missing rows be NA
rownames(x) <- rn
x
})
答案 1 :(得分:0)
# if you dont care about row names
# this works for data.frames with multiple columns
# this does not add empty columns
l=list()
l[[1]]=data.frame(1:4) # note that double [[]] are important else it will be stored as a vector or list in list instead of data.frame in list
l[[2]]=data.frame(1:3)
l[[3]]=data.frame(1:2)
l[[4]]=data.frame(1:4)
# check biggest
size=0
for(i in 1:length(l)){
if (dim(l[[i]])[1]>size) {
size=dim(l[[i]])[1]
}
}
# add empty rows
emptyValue=0
for(i in 1:length(l)){
if(dim(l[[i]])[1]<size){
l[[i]][dim(l[[i]])[1]:size,]=emptyValue
}
}
l
通过改变“lapply”中的“for”来提高速度