如何在df1中添加基于df2的两个CONSECUTIVE列之间的间隔进行插值的新列(df2 $`5`,df2 $`15`,df2 $`25`,df2 $`35`)

时间:2019-04-01 08:41:27

标签: r

我想在T中添加一个新列df1,这取决于df1$xdf2之间的关系。为了使您更好地理解,df1$x是鱼的深度,df2$T的不同列是不同深度(5、15、25和35米)的水温。我想在df1$T中估算鱼的水温取决于柱水的温度。例如:

df1<- data.frame(DateTime=c("2016-08-01 08:01:17","2016-08-01 09:17:14","2016-08-01 10:29:31","2016-08-01 11:35:02","2016-08-01 12:22:45","2016-08-01 13:19:27","2016-08-01 14:58:17","2016-08-01 15:30:10"), x = c(NA,27,44,33,15,17,22,35))
df1$DateTime<- as.POSIXct(df1$DateTime, format = "%Y-%m-%d %H:%M:%S", tz= "UTC") 
df1$DateTime1<- strptime(df1$DateTime, "%Y-%m-%d %H",tz= "UTC") # I create a DateTime variable in the same format than in `df2`.
df1$DateTime1<- as.POSIXct(df1$DateTime1, format = "%Y-%m-%d %H", tz= "UTC") # I transform it to POSIXct.
df2<- data.frame(DateTime=c("2016-08-01 08:00:00","2016-08-01 09:00:00","2016-08-01 10:00:00","2016-08-01 11:00:00","2016-08-01 12:00:00","2016-08-01 13:00:00","2016-08-01 14:00:00","2016-08-01 15:00:00"),T5=c(27.0,27.5,27.1,27.0,26.8,26.3,26.0,26.3),T15=c(23.0,23.4,23.1,22.7,22.5,21.5,22.0,22.3),T25=c(19.0,20.0,19.5,19.6,16.0,16.3,16.2,16.7),T35=c(16.0,16.0,16.5,16.7,16.3,16.7,16.9,16.7))
df2$DateTime<- as.POSIXct(df2$DateTime, format = "%Y-%m-%d %H:%M:%S", tz= "UTC")

df1
             DateTime         x           DateTime1
1 2016-08-01 08:01:17        NA 2016-08-01 08:00:00
2 2016-08-01 09:17:14        27 2016-08-01 09:00:00
3 2016-08-01 10:29:31        44 2016-08-01 10:00:00
4 2016-08-01 11:35:02        33 2016-08-01 11:00:00
5 2016-08-01 12:22:45        15 2016-08-01 12:00:00
6 2016-08-01 13:19:27        17 2016-08-01 13:00:00
7 2016-08-01 14:58:17        22 2016-08-01 14:00:00
8 2016-08-01 15:30:10        35 2016-08-01 15:00:00

df2
             DateTime   T5  T15  T25  T35
1 2016-08-01 08:00:00 27.0 23.0 19.0 16.0 # No difference bigger than 5 at any interval (neither T5 and T15, nor T15 and T25 nor T25 and T35).
2 2016-08-01 09:00:00 27.5 23.4 20.0 16.0 # No difference bigger than 5 at any interval (neither T5 and T15, nor T15 and T25 nor T25 and T35).
3 2016-08-01 10:00:00 27.1 23.1 19.5 16.5 # No difference bigger than 5 at any interval (neither T5 and T15, nor T15 and T25 nor T25 and T35).
4 2016-08-01 11:00:00 27.0 22.7 19.6 16.7 # No difference bigger than 5 at any interval (neither T5 and T15, nor T15 and T25 nor T25 and T35).
5 2016-08-01 12:00:00 26.8 22.5 16.0 16.3 # A difference greater than 5 between `df2$T15` and `df2$25`.
6 2016-08-01 13:00:00 26.3 21.5 16.3 16.7 # A difference greater than 5 between `df2$T15` and `df2$25`.
7 2016-08-01 14:00:00 26.0 22.0 16.2 16.9 # A difference greater than 5 between `df2$T15` and `df2$25`.
8 2016-08-01 15:00:00 26.3 22.3 16.7 16.7 # A difference greater than 5 between `df2$T15` and `df2$25`.

我想要以下内容:

df1$x(我的鱼的深度)低于df$T5时,我希望df1$xdf2$T5。当df1$x(我的鱼的深度)大于df$T35时,我希望df1$xdf2$T35。如果我的鱼df1$x的深度在5到35之间,请查看哪个间隔(T5和T15,T15和T25,T25和T35),然后:

  • 如果区间两端的差小于5,则df1$x是区间两端的值之间的插值。

  • 如果间隔两端之间的差大于5,则将间隔分成两半。在上半部分(例如df$T5df$T10之间),在df1$x == df2$10的情况下对df2$T15进行插值。在下半部分(df2$T10df2$T15之间),df1$x == df2$T15

我期望的结果是:

result
             DateTime         x           DateTime1      T
1 2016-08-01 08:01:17        NA 2016-08-01 08:00:00     NA
2 2016-08-01 09:17:14        27 2016-08-01 09:00:00  19.20
3 2016-08-01 10:29:31        44 2016-08-01 10:00:00  16.50
4 2016-08-01 11:35:02        33 2016-08-01 11:00:00  17.28
5 2016-08-01 12:22:45        15 2016-08-01 12:00:00  22.50
6 2016-08-01 13:19:27        17 2016-08-01 13:00:00  19.42
7 2016-08-01 14:58:17        22 2016-08-01 14:00:00  16.20
8 2016-08-01 15:30:10        35 2016-08-01 15:00:00  16.70

我已经考虑过将其作为一种解决方案,但是我想知道是否有更简单的代码,因为我认为这将花费相当长的时间。

y <- seq(from=5, to=15, by=1) # I create a vector with 11 levels. The upper level corresponds to the above water temperature sensor `df2$T5` and the last level to the below sensor `df2$T15´.
y[2:10]<- "NA" # We don't know water temperature at the levels between the upper and last one. We either interpolate them or assume that they are equal to the water temperature at the lower level.
y<- as.numeric(y)
y
x <- seq(from=15, to=25, by=1) # The same criteria. In this case, the vector is for x when `df1$x` is between 15 and 25.
x[2:10]<- "NA"
x<- as.numeric(x)
x
k <- seq(from=25, to=35, by=1) # The same criteria. In this case, the vector is for x when `df1$x` is between 25 and 35.
k[2:10]<- "NA"
k<- as.numeric(k)
k

for (i in 1:nrow(df1)) {
  if (is.na(df1$x[i])){
    df1$T[i] <-"NA"
  }else if (!is.na(df1$x[i]) & df1$x[i] > 0 & df1$x[i] <= 5){
    df1$T[i] <- df2$T5[which(df1$DateTime1[i] == df2$DateTime)]
  }else if (!is.na(df1$x[i]) & df1$x[i] > 5 & df1$x[i] <= 15 & df2$T15[which(df1$DateTime1[i] == df2$DateTime)] - df2$T5[which(df1$DateTime1[i] == df2$DateTime)] < 5){
    y[1]<- df2$T5[which(df1$DateTime1[i] == df2$DateTime)]
    y[11]<- df2$T15[which(df1$DateTime1[i] == df2$DateTime)]
    y <-na.approx(y)
    df1$T[i] <- y[round(df1$x[i])-4]
    y <- seq(from=5, to=15, by=1)
    y[2:10]<- "NA"
    y<- as.numeric(y)
  }else if (!is.na(df1$x[i]) & df1$x[i] > 15 & df1$x[i] <= 25 & df2$T25[which(df1$DateTime1[i] == df2$DateTime)] - df2$T15[which(df1$DateTime1[i] == df2$DateTime)] < 5){
    x[1]<- df2$T15[which(df1$DateTime1[i] == df2$DateTime)]
    x[11]<- df2$T25[which(df1$DateTime1[i] == df2$DateTime)]
    x <-na.approx(x)
    df1$T[i] <- x[round(df1$x[i])-14]
    x <- seq(from=15, to=25, by=1)
    x[2:10]<- "NA"
    x<- as.numeric(x)
  }else if (!is.na(df1$x[i]) & df1$x[i] > 25 & df1$x[i] <= 35 & df2$T35[which(df1$DateTime1[i] == df2$DateTime)] - df2$T25[which(df1$DateTime1[i] == df2$DateTime)] < 5){
    k[1]<- df2$T25[which(df1$DateTime1[i] == df2$DateTime)]
    k[11]<- df2$T35[which(df1$DateTime1[i] == df2$DateTime)]
    k <-na.approx(k)
    df1$T[i] <- k[round(df1$x[i])-24]
    k <- seq(from=25, to=35, by=1)
    k[2:10]<- "NA"
    k<- as.numeric(k)
  }else if (!is.na(df1$x[i]) & df1$x[i] > 5 & df1$x[i] <= 15 & df2$T15[which(df1$DateTime1[i] == df2$DateTime)] - df2$T5[which(df1$DateTime1[i] == df2$DateTime)] > 5){
    y[1]<- df2$T5[which(df1$DateTime1[i] == df2$DateTime)]
    y[6]<- df2$T15[which(df1$DateTime1[i] == df2$DateTime)]
    y[11]<- df2$T15[which(df1$DateTime1[i] == df2$DateTime)]
    y <-na.approx(y)
    df1$T[i] <- y[round(df1$x[i])-4]
    y <- seq(from=5, to=15, by=1)
    y[2:10]<- "NA"
    y<- as.numeric(y)
  }else if (!is.na(df1$x[i]) & df1$x[i] > 15 & df1$x[i] <= 25 & df2$T25[which(df1$DateTime1[i] == df2$DateTime)] - df2$T15[which(df1$DateTime1[i] == df2$DateTime)] > 5){
    x[1]<- df2$T15[which(df1$DateTime1[i] == df2$DateTime)]
    x[6]<- df2$T25[which(df1$DateTime1[i] == df2$DateTime)]
    x[11]<- df2$T25[which(df1$DateTime1[i] == df2$DateTime)]
    x <-na.approx(x)
    df1$T[i] <- x[round(df1$x[i])-14]
    x <- seq(from=15, to=25, by=1)
    x[2:10]<- "NA"
    x<- as.numeric(x)
  }else if (!is.na(df1$x[i]) & df1$x[i] > 25 & df1$x[i] <= 35 & df2$T35[which(df1$DateTime1[i] == df2$DateTime)] - df2$T25[which(df1$DateTime1[i] == df2$DateTime)] > 5){
    k[1]<- df2$T25[which(df1$DateTime1[i] == df2$DateTime)]
    k[6]<- df2$T35[which(df1$DateTime1[i] == df2$DateTime)]
    k[11]<- df2$T35[which(df1$DateTime1[i] == df2$DateTime)]
    k <-na.approx(k)
    df1$T[i] <- k[round(df1$x[i])-24]
    k <- seq(from=25, to=35, by=1)
    k[2:10]<- "NA"
    k<- as.numeric(k) 
  }else if (!is.na(df1$x[i]) & df1$x[i] > 35){
    df1$T[i] <- df2$T35[which(df1$DateTime1[i] == df2$DateTime)]
  }
}

1 个答案:

答案 0 :(得分:1)

#Assuming that df1 and df2 fit by row (If not you might need to use merge)
#Simple interpolation (don't care of tdif >= 5)
df1$T <- sapply(1:NROW(df1), function(x) approxfun(c(5,15,25,35), df2[x,c("T5","T15","T25","T35")], rule=2)(df1$x[x]))

#Using you rules (can reproduce your expected result)
#And merging by df1$DateTime1 and df2$DateTime
df1$T <- sapply(1:NROW(df1), function(x) {
  depth <- df1$x[x]
  if(!is.finite(depth)) {return(NA);}
  dc <- c(5,15,25,35)
  temp <- as.numeric(df2[match(df1$DateTime1[x], df2$DateTime)[1],c("T5","T15","T25","T35")])
  idx0 <- findInterval(depth, c(15,25,35))+1
  idx1 <- findInterval(depth, c(5,15,25))+1
  tDif <- abs(temp[idx1] - temp[idx0])
  if(tDif<5) {return(temp[idx0] + (depth - dc[idx0]) * (temp[idx1] - temp[idx0]) / 10)}
  if(depth%%10 >=5) {return(temp[idx0] + 2*(depth - dc[idx0]) * (temp[idx1] - temp[idx0]) / 10)}
  temp[idx1]
}
)
#NA 19.20 16.50 17.28 22.50 19.42 16.20 16.70