我尝试使用嵌套循环对数据集进行子集化。不幸的是,它似乎没有正常工作:我得到了一些警告,循环也没有按照我的意愿工作。
这是一个简短的代码示例。提供的数据只是一个例子 - 实际数据集要大得多:任何涉及手动选择值的解决方案都是不可行的。
# #Generate example data
unique_test <- list()
unique_test[[1]] <- c(178.5, 179.5, 180.5, 181.5)
unique_test[[2]] <- c(269.5, 270.5, 271.5)
tmp_dataframe1 <- data.frame(myID = c(268, 305, 268, 305, 268, 305, 306),
myvalue = c(1.150343, 2.830392, 1.150343, 2.830392, 1.150343, 2.830392, 1.150343),
myInter = c(178.5, 178.5, 179.5, 179.5, 180.5, 180.5, 181.5))
tmp_dataframe2 <- data.frame(myID = c(144, 188, 196, 300, 301, 302, 303, 97),
myvalue = c(1.293493, 3.286649, 1.408049, 0.469219, 11.143147, 0.687355, 0.508603, 0.654335),
myInter = c(269.5, 269.5, 269.5, 270.5, 270.5, 271.5, 185.5, 186.5))
mydata <- list()
mydata[[1]] <- tmp_dataframe1
mydata[[2]] <- tmp_dataframe2
########################
# #Generate nested loop
mysubset <- list() #Define list
for(i in 1:length(unique_test)){
#Prepare list of lists
mysubset[[i]] <- NaN
for(j in 1:length(unique_test[[i]])){
#Select myvalues whose myInter data equals the one found in unique_test and assign them to a new subset
mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter == unique_test[[i]][j]),][["myvalue"]]
}
}
# #There are warnings and the nested loop is not really doing, what it is supposed to do!
R发出以下警告:
Warning messages:
1: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter == :
number of items to replace is not a multiple of replacement length
2: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter == :
number of items to replace is not a multiple of replacement length
3: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter == :
number of items to replace is not a multiple of replacement length
4: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter == :
number of items to replace is not a multiple of replacement length
5: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter == :
number of items to replace is not a multiple of replacement length
如果我将自己限制在我的数据集中的第一个元素,那么&#34;正常&#34; (即没有嵌套)循环工作:
# #If I don't use a nested loop (by just using the first element in both "mydata" and "unique_test"), things seem to work out
# #But obviously, this is not really what I want to achieve (I can't just manually select every element in mydata and unique_test)
mysubset <- list()
for(i in 1:length(unique_test[[1]])){
#Select myvalues whose myInter data equals the one found in unique_test and assign them to a new subset
mysubset[[i]] <- mydata[[1]][which(mydata[[1]]$myInter == unique_test[[1]][i]),][["myvalue"]]
}
难道我首先必须以适当的尺寸启动我的列表吗?但是,如果我的数据集中的所有元素的维度不相同(这就是为什么我必须首先使用length()函数),我该怎么做呢? 正如你所看到的,mydata [[1]]与mydata [[2]]的尺寸不同。 因此,以下链接中提供的解决方案不适用于此数据集:
Error in R :Number of items to replace is not a multiple of replacement length
Error in `*tmp*`[[k]] : subscript out of bounds in R
我很确定它是一个显而易见的东西,但是我找不到它。非常感谢任何帮助!
如果有更好的方法可以在没有循环的情况下实现相同的目标(我确信有,例如apply()或者某些东西沿着子集()),我也会很感激这样的评论。不幸的是,我对替代方案不够熟悉,无法快速实施它们。
答案 0 :(得分:1)
由于嵌套list()
循环而不是向量本身,因此在尝试将数字向量分配给嵌套列表时,只需将作业包装在for
中。
mysubset[[i]][j] <- list(mydata[[i]][which(mydata[[i]]$myInter == unique_test[[i]][j]),][["myvalue"]])
或者不需要which()
,也不需要外方括号:
mysubset[[i]][j] <- list(mydata[[i]][mydata[[i]]$myInter == unique_test[[i]][j], c("myvalue")])
或者,考虑应用解决方案,因为您不需要最初分配空列表并迭代地扩展它以将值绑定到它。嵌套lapply
,sapply
,mapply
,甚至rapply
可以在一次调用中创建所需的列表和维度。 mapply
假定 unique_test 和 mydata 始终是等长对象。
# NESTED LAPPLY
mysubset2 <- lapply(seq(length(unique_test)), function(i) {
lapply(seq(length(unique_test[[i]])), function(j){
mydata[[i]][mydata[[i]]$myInter == unique_test[[i]][j], c("myvalue")]
})
})
# NESTED SAPPLY
mysubset3 <- sapply(seq(length(unique_test)), function(i) {
sapply(seq(length(unique_test[[i]])), function(j){
mydata[[i]][mydata[[i]]$myInter == unique_test[[i]][j], c("myvalue")]
})
}, simplify = FALSE)
# NESTED M/LAPPLY
mysubset4 <- mapply(function(u, m){
lapply(u, function(i) m[m$myInter == i, c("myvalue")])
}, unique_test, mydata, SIMPLIFY = FALSE)
# NESTED R/LAPPLY
mysubset5 <- rapply(unique_test, function(i){
df <- do.call(rbind, mydata)
lapply(i, function(u) df[df$myInter == u, c("myvalue")])
}, how="list")
# ALL SUBSETS EQUAL EXACTLY
all.equal(mysubset, mysubset2)
# [1] TRUE
all.equal(mysubset, mysubset3)
# [1] TRUE
all.equal(mysubset, mysubset4)
# [1] TRUE
all.equal(mysubset, mysubset5)
# [1] TRUE
答案 1 :(得分:0)
你能发布你期望的mysubset看起来像什么吗?根据我的理解,这应该使用unique_test中的值来对myvalue进行子集化:
mysubset <- unique(unlist(lapply(unlist(unique_test),function(x) subset(mydata,myInter==x,select="myvalue"))))