R中的日期操作和算术

时间:2014-08-25 11:06:55

标签: r datetime

我是R的新手,在计算和与R中的日期进行比较时遇到了一些麻烦。基本上,我需要数据框:

df1< - (2921行)

    DateTime
1   2013-06-01 00:00:00
2   2013-06-01 03:00:00
3   2013-06-01 06:00:00
4   2013-06-01 09:00:00
5   2013-06-01 12:00:00
6   2013-06-01 15:00:00
7   2013-06-01 18:00:00
8   2013-06-01 21:00:00
9   2013-06-02 00:00:00
10  2013-06-02 03:00:00 

和df2< - (70,816行)

Create.Date.Time        Service         Closing.Date.Time
1   2013-06-01 12:59:00 AV              2013-06-01 13:59:00
2   2013-06-02 07:56:00 SERVICE684793   2013-06-02 08:59:00
3   2013-06-02 09:39:00 SERVICE684793   2013-06-03 12:01:00
4   2013-06-02 14:14:00 SERVICE684796   2013-06-02 14:55:00
5   2013-06-02 17:20:00 SERVICE684797   2013-06-03 12:06:00
6   2013-06-03 07:20:00 SERVICE684793   2013-06-03 07:39:00
7   2013-06-03 08:02:00 SERVICE684839   2013-06-03 12:09:00
8   2013-06-03 08:04:00 SERVICE684841   2013-06-04 08:05:00
9   2013-06-03 08:04:00 SERVICE684841   2013-06-05 08:06:00
10  2013-06-03 08:08:00 SERVICE684841   2013-06-03 08:08:00

我的任务是为每个df2$Create.Date.time获取i in df1$DateTime的累积计数。换句话说,我希望计算df2$Create.Date.Time小于或等于每个df1$DateTime的实例数。

例如,对于df1$DateTime = 2013-06-02 18:00:00df2$Create.Date.Time的累积计数为5(Create.Date.Time中有2013-06-02 18:00:00早于df$2的{​​{1}} })。

我还需要为每项服务做同样的事情。

我已经尝试将日期(所有这些都是类"POSIXct" "POSIXt")转换为秒然后进行比较,但我一直遇到奇怪的错误。我将不胜感激任何帮助。

1 个答案:

答案 0 :(得分:0)

尝试:

   library(lubridate)

    df1New <- within(df1, {
        Createtime <- period_to_seconds(hms(strftime(DateTime, "%H:%M:%S")))
       Date <- as.Date(DateTime)     
      })

    df2New <- within(df2, {
        Createtime1 <- period_to_seconds(hms(strftime(Create.Date.Time, "%H:%M:%S")))
        Date <- as.Date(Create.Date.Time)
     }) 


   df1New$Num.Closed <- unsplit(lapply(split(df1New, df1New$Date), function(x) {
    x2 <- df2New[df2New$Date %in% x$Date, ]
    unlist(lapply(1:nrow(x), function(i) {
    x1 <- x[i, ]
    sum(x2$Createtime1 <= x1$Createtime)
    }))
   }), df1New$Date)

   df1New[,-(2:3)]
   #             DateTime Num.Closed
  #1  2013-06-01 00:00:00          0
  #2  2013-06-01 03:00:00          0
  #3  2013-06-01 06:00:00          0
  #4  2013-06-01 09:00:00          0
  #5  2013-06-01 12:00:00          0
  #6  2013-06-01 15:00:00          1
  #7  2013-06-01 18:00:00          1
  #8  2013-06-01 21:00:00          1
  #9  2013-06-02 00:00:00          0
  #10 2013-06-02 03:00:00          0

数据

  df1 <- structure(list(DateTime = c("2013-06-01 00:00:00", "2013-06-01 03:00:00", 
  "2013-06-01 06:00:00", "2013-06-01 09:00:00", "2013-06-01 12:00:00", 
  "2013-06-01 15:00:00", "2013-06-01 18:00:00", "2013-06-01 21:00:00", 
  "2013-06-02 00:00:00", "2013-06-02 03:00:00")), .Names = "DateTime", class = "data.frame", row.names = c("1", 
  "2", "3", "4", "5", "6", "7", "8", "9", "10"))

  df2 <- structure(list(Create.Date.Time = c("2013-06-01 12:59:00", "2013-06-02   07:56:00", 
  "2013-06-02 09:39:00", "2013-06-02 14:14:00", "2013-06-02 17:20:00", 
  "2013-06-03 07:20:00", "2013-06-03 08:02:00", "2013-06-03 08:04:00", 
  "2013-06-03 08:04:00", "2013-06-03 08:08:00"), Service = c("AV", 
  "SERVICE684793", "SERVICE684793", "SERVICE684796", "SERVICE684797", 
  "SERVICE684793", "SERVICE684839", "SERVICE684841", "SERVICE684841", 
  "SERVICE684841"), Closing.Date.Time = c("2013-06-01 13:59:00", 
  "2013-06-02 08:59:00", "2013-06-03 12:01:00", "2013-06-02 14:55:00", 
  "2013-06-03 12:06:00", "2013-06-03 07:39:00", "2013-06-03 12:09:00", 
  "2013-06-04 08:05:00", "2013-06-05 08:06:00", "2013-06-03 08:08:00"
  )), .Names = c("Create.Date.Time", "Service", "Closing.Date.Time"
  ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
  "6", "7", "8", "9", "10"))