根据我在Vectorize/Speed up Code with Nested For Loops
打开的另一个问题,这是一个更集中的问题基本上,我想加快执行此代码的速度。我正在考虑使用apply
系列函数之一。 apply
函数必须使用/执行以下操作:
输入:遍历区域1到10;带有预分配尺寸的向量sed
和borewidth
填充了NAs
流程:以内部sed
循环中实现的方式填充borewidth
和for
中的每一个数据
输出:sed
和borewidth
向量
假设(h / t Simon Urbanek):每行的开始,结束点是连续的,顺序的,对于每个区域,从0开始。
代码如下:
for (region in 1:10) {
# subset standRef and sample by region code
standRef.region <- standRef[which(standRef$region == region),]
sample.region <- sample[which(sample$region == region),]
for (i in 1:nrow(sample.region))
{
# create a dataframe - locations - that includes:
# 1) those indices of standRef.region in which the value of the location column is greater than the value of the ith row of the begin column of sample.region
# 2) those indices of standRef.region in which the value of the location column is less than the value of the ith row of the finish column of sample.region
locations <- standRef.region[which((standRef.region$location > sample.region$begin[i]) & (standRef.region$location < sample.region$finish[i])),]
sed[end_tracker:(end_tracker + nrow(locations))] <- sample.region$sed[i]
borewidth[end_tracker:(end_tracker + nrow(locations))] <- sample.region$borewidth[i]
# update end_tracker to the number of locations rows for this iteration
end_tracker <- end_tracker + nrow(locations)
}
cat("Finished region", region,"\n")
}
borewidth
和sed
的示例数据。修改:dput
structure(list(region = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
begin = c(0L, 2253252L, 7091077L, 9120205L, 0L, 135094L,
941813L, 5901391L, 6061324L), finish = c(2253252L, 7091077L,
9120205L, 17463033L, 135094L, 941813L, 5901391L, 6061324L,
7092402L), sed = c(3.31830840984048, 1.38014704208403, 6.13049140975458,
2.10349875097134, 0.48170587509345, 0.13058713509175, 9.13509713513509,
6.13047153058701, 3.81734081501503), borewidth = c(3L, 5L,
2L, 1L, 1L, 1L, 2L, 4L, 4L)), .Names = c("region", "begin",
"finish", "sed", "borewidth"), class = "data.frame", row.names = c(NA,
-9L))
TIA。
答案 0 :(得分:4)
根据您发布的数据(包括其他问题)进行一些额外假设,这是您可以采用的一种方式:
index <- unlist(lapply (unique(standRef$region), function(reg) {
reg.filter <- which(standRef$region == reg)
samp.filter <- which(sample$region == reg)
samp.filter[cut(standRef$location[reg.filter],c(0L,sample$finish[samp.filter]),labels=F)]
}))
sed <- sample$sed[index]
borewidth <- sample$borewidth[index]
额外的假设是您的样本是连续的,顺序的(所有示例都是)并且从0开始。这允许我们在cut()
上使用$finish
而不是分别处理每个间隔。一个区别是你在休息时编码左边的间隙,但我认为这不是故意的。