跨不同数据帧的两个数字之间的Sumif

时间:2018-02-22 15:16:43

标签: r sum plyr

我有两个数据框。其中一个“从”和“到”的间隔如下;

Intervals <- data.frame("From" = c(0.0000,0.0069,0.0139,0.0208,0.0278,0.0347,0.0417,0.0486,0.0556,0.0625,0.0694,0.0764,0.0833),
                        "To" = c(0.0410,0.0479,0.0549,0.0618,0.0688,0.0757,0.0826,0.0896,0.0965,0.1035,0.1104,0.1174,0.1243))

,第二个数据框是:

x <- data.frame("Dummy" = c(0,1,0,0,0,0,0,0,1,0,0,0,0), 
                "Dummy Time" = c(0,0,0.006944444,0.006944444,0.010416667,0.010416667,0.013888889,0.013888889,0.020833333,0.024305556,0.027777778,0.03125,0.03125))

因此,如果虚拟时间落在间隔df中的From和To(或等于)之间,我基本上想要在虚拟变量的R中进行求和。这在excel中很容易,但对于R来说我是一个新手。

cbind不会工作,因为Intervals和x是不同的行。基本上,间隔时间只是标准日,我希望每隔一段时间创建一个新列,以显示在该时间段内产生的dummys总和

1 个答案:

答案 0 :(得分:0)

我能想到的最透明的方式是:

n_interval = nrow(Intervals)
Intervals$DummySum = numeric(n_interval)
for(i in 1:n_interval) {
  ind_i = x$DummyTime >= Intervals$From[i] & x$DummyTime < Intervals$To[i]
  Intervals$DummySum[i] = sum(x$Dummy[ind_i])
}

这简单地遍历所有间隔,识别每个间隔内的虚拟对象,并总结这些值。

如果您不喜欢for循环,可以使用sapply

Intervals$DummySum = 
  sapply(1:nrow(Intervals), function(i) sum(
    x$Dummy[
      x$DummyTime >= Intervals$From[i] & x$DummyTime < Intervals$To[i]
      ]
  ))

最后,您可以将其转换为更通用的功能:

sum_in_intervals = function(interval_start, interval_end, times, values, na.rm = FALSE) {
  stopifnot(length(interval_start) == length(interval_end))
  stopifnot(length(times) == length(values))

  return(
    sapply(1:length(interval_start), function(i) sum(
      values[
        times >= interval_start[i] & times < interval_end[i]
      ], 
      na.rm = na.rm
    ))
  )
}

Intervals$DummySum = sum_in_intervals(Intervals$From, Intervals$To, x$DummyTime, x$Dummy)