如何根据日期R计算月数

时间:2018-03-21 03:46:22

标签: r dataframe dplyr tidyr tidyverse

我有下面提到的dataframr:

DF_1

ID      Date
123     18/03/2018 16:45
456     10/03/2018 20:15

DF_2

ID      Date1                  Date2
123     2018-03-18 06:37:22    1519109133704
123     2018-03-18 06:37:21    1520324827462
123     2018-03-16 04:03:01    1520690354458
456     2018-03-10 14:46:03    1517319313151
456     2018-03-10 14:46:04    1515143046429
456     2018-03-10 14:46:03    1515838021062
456     2018-03-10 14:46:15    1488092209241
  • 考虑到Month,我希望将月数作为Date2 与事件ID相关的DF_2与DF_1的比较ID
  • 平均数为Avg每月创建的行数(即如果有3个 基于Date2的月份,包括90行,比平均水平为30)。
  • 并且每天的平均行数为Day(即如果有3个月 包含90行而不是Day)的值为1。
  • Last5过去5天内创建的行数(考虑Date1) 关于Sys.Date()

我有下面提到的相同代码:

library(tidyverse)
 library(lubridate)
 DF_2 <- tibble(ID = c(123L, 123L, 123L, 456L, 456L, 456L, 456L), 
                Date1 = c("2018-03-18 06:37:22", "2018-03-18 06:37:21", "2018-03-16 04:03:01", 
                 "2018-03-10 14:46:03", "2018-03-10 14:46:04", "2018-03-10 14:46:03", 
                 "2018-03-10 14:46:15"), 
                Date2 = c(1519109133704, 1520324827462, 1520690354458, 1517319313151, 1515143046429, 1515838021062, 1488092209241)
               )

 DF_2 <- DF_2 %>% mutate(Date1 = ymd_hms(Date1), 
                         Date2 = as.POSIXct(Date2/1000,origin = "1970-01-01")) 

 DF_2_tab <- DF_2 %>% group_by(ID) %>% summarise(date1 = sum(date(Date1)==date(DF_1$Date1[DF_1$ID==ID])),
                            Total = n(), 
                            Month = month(count(Date2)),
                            Avg = mean #Don;t know how to calculate
                            Day = day(Date2),
                            Last5 = sum( (Sys.Date()-date(Date1)) < 5 )
                            )

1 个答案:

答案 0 :(得分:1)

你的陈述1不是很清楚,DF_1的用途是什么。无论如何,请参阅下面的代码,以您想要的方式总结DF_2。在这种情况下,我有不同的月份数,并且总记录,第2点和第3点已经完成(假设您每月只需要30天,如上所述)。第4点在代码中完成 -

DF_2 = data.table(DF_2)
DF = DF_2[, list(num_mth = uniqueN(format(Date2, "%Y%m")), num_rec=.N, 
          numrec_5d=length(ID[as.numeric(difftime(today(), Date2), units = "days")<=5])), 
          by=ID]

由于您解释了DF_1的使用,我编辑了我的代码。现在首先合并ID和date1上的两个数据集,然后汇总 -

DF_2 <- tibble(ID = c(123L, 123L, 123L, 456L, 456L, 456L, 456L), 
               Date1 = c("2018-03-18 06:37:22", "2018-03-18 06:37:21", "2018-03-16 04:03:01", 
                         "2018-03-10 14:46:03", "2018-03-10 14:46:04", "2018-03-10 14:46:03", 
                         "2018-03-10 14:46:15"), 
               Date2 = c(1519109133704, 1520324827462, 1520690354458, 1517319313151, 1515143046429, 1515838021062, 1488092209241)
)

DF_2 <- DF_2 %>% mutate(Date1 = ymd_hms(Date1), 
                        Date2 = as.POSIXct(Date2/1000,origin = "1970-01-01")) 


DF_1 <- tibble(ID = c(123L, 456L), 
               Date1 = c("18/03/2018 16:45", "10/03/2018 20:15"))

DF_1 <- DF_1 %>% mutate(Date1 = dmy_hm(Date1))


DF_2 = data.table(DF_2)
DF_1 = data.table(DF_1)

DF_2 = DF_2[, Date1:= date(Date1)]
DF_2 = DF_2[, Date2:= date(Date2)]
DF_1 = DF_1[, Date1:= date(Date1)]
DF_1[DF_2, on = c("ID","Date1") , nomatch=0L]


DF = DF_2[, list(num_mth = uniqueN(format(Date2, "%Y%m")), num_rec=.N,
          num_day = uniqueN(format(Date2, "%Y%m%d")),
          numrec_5d=length(ID[as.numeric(difftime(today(), Date2), units = "days")<=5])), 
          by=ID]
DF[, recpermonth := num_rec/num_mth][, recperday := num_rec/num_day][, recperday2 := num_mth/num_day/30]