R Stats:比较两个数据帧中的时间戳

时间:2013-08-26 09:09:20

标签: r comparison timestamp

我是R的新手如果答案显而易见,请原谅我。我也试图寻找答案,但我认为我没有使用正确的条款。

我有两个数据帧,每个数据帧由日期时间和值

组成

e.g。 数据框1:

2003-01-01 10:00:00 | 10
2003-01-02 10:00:00 | 5
2003-01-03 10:00:00 | 7
 ...<snip>...
2003-06-15 10:00:00 | 4.5
2003-06-16 10:00:00 | 4.5
2003-06-17 10:00:00 | 3.5
 ...<snip>...
2003-11-21 10:00:00 | 3.5
2003-11-22 10:00:00 | 4
2003-11-23 10:00:00 | 4.5

和数据框2:

2003-01-01 09:00:00 | 2
2003-03-19 12:00:00 | 5
2003-05-14 14:00:00 | 3.5
2003-06-10 14:00:00 | 4
 ...<snip>...
2003-10-20 14:00:00 | 2
2003-11-22 14:00:00 | 3

如果第一个数据帧中的时间戳在第二个数据帧的时间戳内,我想要做的是将值加在一起

e.g。

2003-01-01 10:00:00在2003-01-01 09:00:00和2003-03-19 12:00:00之间,因此要执行的计算是10 + 2。

[删除了不一致的陈述]

我假设在R中有一种简单的方法。作为程序员,我的第一直觉就是使用for循环。

编辑: 我想要的是类似下面的内容

    timestamp          | measurement | correction | corrected
   2003-01-01 10:00:00 | 10          | 2          | 12   
   2003-01-02 10:00:00 | 5           | 2          | 7
   2003-01-03 10:00:00 | 7           | 2          | 9
         ...<snip>...
   2003-06-15 10:00:00 | 4.5         | 4          | 8.5
   2003-06-16 10:00:00 | 4.5         | 4          | 8.5
   2003-06-17 10:00:00 | 3.5         | 4          | 7.5
         ...<snip>...
   2003-11-21 10:00:00 | 3.5         | 2          | 5.5
   2003-11-22 10:00:00 | 4           | 2          | 6
   2003-11-23 10:00:00 | 4.5         | 3          | 7.5

真正重要的是获得修正后的价值。 我已经(有点)在多个for循环中工作,但我希望能够以“R”方式进行。

 Time from DF1            Time from DF2      Calculation 
2003-11-21 10:00:00 >= 2003-10-20 14:00:00 = 3.5 + 2
2003-11-22 10:00:00 >= 2003-10-20 14:00:00 = 4   + 2
2003-11-23 10:00:00 >= 2003-11-22 14:00:00 = 4.5 + 3

编辑2:

我让它使用循环。有没有更好的方法呢?

library(plyr)
library(lubridate)

df_measurements <- read.table(text = "
2003-01-01 10:00:00 | 10
2003-01-02 10:00:00 | 5
2003-01-03 10:00:00 | 7
2003-06-15 10:00:00 | 4.5
2003-06-16 10:00:00 | 4.5
2003-06-17 10:00:00 | 3.5
2003-11-21 10:00:00 | 3.5
2003-11-22 10:00:00 | 4
2003-11-23 10:00:00 | 4.5", sep = "|")

df_corrections <- read.table(text = "
2003-01-01 09:00:00 | 5.5
2003-05-01 09:00:00 | 6
2003-08-01 09:00:00 | 8", sep = "|")

#Create named columns and remove unneeded
df_measurements$time <- ymd_hms(df_measurements$V1)
df_measurements$obs <- df_measurements$V2
df_measurements$V1 <- NULL
df_measurements$V2 <- NULL

df_corrections$time <- ymd_hms(df_corrections$V1)
df_corrections$offset <- df_corrections$V2
df_corrections$V1 <- NULL
df_corrections$V2 <- NULL

#Get number of corrections
c_length <- nrow(df_corrections)

#Create blank data frame to merge results into
result <- data.frame(time=as.Date(character()), obs=numeric(), correction=numeric(), corrected=numeric(), stringsAsFactors=FALSE )

for(i in c(1:c_length)) {

  if(i < c_length) {

    subset_m <- df_measurements[df_measurements$time >= df_corrections$time[[i]] & df_measurements$time < df_corrections$time[[i+1]], ]
  } else {

    #Last correction in correction data frame
    subset_m <- df_measurements[df_measurements$time >= df_corrections$time[[i]], ]
  }

  #Make "correction" column and fill with correction to be used
  subset_m[, "correction"] <- rep(df_corrections$offset[[i]], nrow(subset_m)) 

  #Make "corrected" column and fill with corrected value
  subset_m$corrected <- subset_m$correction + subset_m$obs  

  #Combine subset with result
  result <- rbind(result, subset_m)

}

print(result)

1 个答案:

答案 0 :(得分:0)

注意:此答案是指原始问题,在我发布工作答案后进行了编辑

这是你想要的吗?

df <- read.table(text = "2003-01-01 10:00:00 | 10
2003-01-02 10:00:00 | 5
2003-01-03 10:00:00 | 7
2003-06-15 10:00:00 | 4.5
2003-06-16 10:00:00 | 4.5
2003-06-17 10:00:00 | 3.5", sep = "|")
df$time <- as.POSIXct(df$V1)

df2 <- read.table(text = "2003-01-01 09:00:00 | 2
2003-05-01 09:00:00 | 5", sep = "|")
df2$time <- as.POSIXct(df2$V1)

df$val <- with(df, ifelse(df$time >= df2$time[1] & df$time <= df2$time[2], df$V2 + 2, df$V2 + 5))