我有两个数据框。其中一个“从”和“到”的间隔如下;
Intervals <- data.frame("From" = c(0.0000,0.0069,0.0139,0.0208,0.0278,0.0347,0.0417,0.0486,0.0556,0.0625,0.0694,0.0764,0.0833),
"To" = c(0.0410,0.0479,0.0549,0.0618,0.0688,0.0757,0.0826,0.0896,0.0965,0.1035,0.1104,0.1174,0.1243))
,第二个数据框是:
x <- data.frame("Dummy" = c(0,1,0,0,0,0,0,0,1,0,0,0,0),
"Dummy Time" = c(0,0,0.006944444,0.006944444,0.010416667,0.010416667,0.013888889,0.013888889,0.020833333,0.024305556,0.027777778,0.03125,0.03125))
因此,如果虚拟时间落在间隔df中的From和To(或等于)之间,我基本上想要在虚拟变量的R中进行求和。这在excel中很容易,但对于R来说我是一个新手。
cbind不会工作,因为Intervals和x是不同的行。基本上,间隔时间只是标准日,我希望每隔一段时间创建一个新列,以显示在该时间段内产生的dummys总和
答案 0 :(得分:0)
我能想到的最透明的方式是:
n_interval = nrow(Intervals)
Intervals$DummySum = numeric(n_interval)
for(i in 1:n_interval) {
ind_i = x$DummyTime >= Intervals$From[i] & x$DummyTime < Intervals$To[i]
Intervals$DummySum[i] = sum(x$Dummy[ind_i])
}
这简单地遍历所有间隔,识别每个间隔内的虚拟对象,并总结这些值。
如果您不喜欢for
循环,可以使用sapply
:
Intervals$DummySum =
sapply(1:nrow(Intervals), function(i) sum(
x$Dummy[
x$DummyTime >= Intervals$From[i] & x$DummyTime < Intervals$To[i]
]
))
最后,您可以将其转换为更通用的功能:
sum_in_intervals = function(interval_start, interval_end, times, values, na.rm = FALSE) {
stopifnot(length(interval_start) == length(interval_end))
stopifnot(length(times) == length(values))
return(
sapply(1:length(interval_start), function(i) sum(
values[
times >= interval_start[i] & times < interval_end[i]
],
na.rm = na.rm
))
)
}
Intervals$DummySum = sum_in_intervals(Intervals$From, Intervals$To, x$DummyTime, x$Dummy)