在R中按组计算黑白日期差异-未解决

时间:2018-06-28 15:34:52

标签: r dplyr date-arithmetic

我正在尝试按R中的组来计算b / w最小和最大日期之间的差异。实现这一目标的代码是here。但是,复制该示例不会导致预期的结果。这是使用的数据集示例:

HS_Hatch <- structure(list(ClutchID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                                        2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L
), DateVisit = c("3/15/2012", "3/18/2012", "3/20/2012", "4/1/2012", 
                 "4/3/2012", "3/18/2012", "3/20/2012", "3/22/2012", "4/3/2012", 
                 "4/4/2012", "3/22/2012", "4/3/2012", "4/4/2012", "3/18/2012", 
                 "3/20/2012", "3/22/2012", "4/2/2012", "4/3/2012", "4/4/2012", 
                 "3/20/2012", "3/22/2012", "3/25/2012", "3/27/2012", "4/4/2012", 
                 "4/5/2012"), Year = c(2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 
                                       2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 
                                       2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 
                                       2012L), Survive = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                           1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = c("tbl_df", 
                                                                                                                               "tbl", "data.frame"), row.names = c(NA, -25L), .Names = c("ClutchID", 
                                                                                                                                                                                         "DateVisit", "Year", "Survive"), spec = structure(list(cols = structure(list(
                                                                                                                                                                                             ClutchID = structure(list(), class = c("collector_integer", 
                                                                                                                                                                                                                                    "collector")), DateVisit = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                                           "collector")), Year = structure(list(), class = c("collector_integer", 
                                                                                                                                                                                                                                                                                                                                             "collector")), Survive = structure(list(), class = c("collector_integer", 
                                                                                                                                                                                                                                                                                                                                                                                                  "collector"))), .Names = c("ClutchID", "DateVisit", "Year", 
                                                                                                                                                                                                                                                                                                                                                                                                                             "Survive")), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "collector"))), .Names = c("cols", "default"), class = "col_spec"))

这是使用dplyr提出的解决方案:

library(dplyr)
HS_Hatch <- HS_Hatch %>%
 mutate(date_visit = as.Date(DateVisit, "%m/%d/%Y"))
exposure <- HS_Hatch %>% 
    group_by(ClutchID) %>%
    summarize(first_visit = min(date_visit), 
              last_visit = max(date_visit), 
              exposure = last_visit - first_visit)

这是预期的结果:

  ClutchID first_visit last_visit exposure
     <int>      <date>     <date>    <dbl>
1        1  2012-03-15 2012-04-03       19
2        2  2012-03-18 2012-04-04       17
3        3  2012-03-22 2012-04-04       13
4        4  2012-03-18 2012-04-04       17
5        5  2012-03-20 2012-04-05       16

这是实际结果:

  first_visit last_visit exposure
1  2012-03-15 2012-04-05  21 days

似乎分组因子被忽略了。如何计算每个ClutchID的日期差?

2 个答案:

答案 0 :(得分:2)

仅在加载dplyr时有效。

summarize更改为dplyr::summarize以使其清晰。我建议不要使用plyr,因为您可以使用dplyr和tidyverse进行所有操作。

答案 1 :(得分:1)

导入数据框后,尝试此操作

HS_Hatch$DateVisit = as.Date(HS_Hatch$DateVisit, "%m/%d/%Y")
HS_Hatch$DateVisit = as.POSIXct(HS_Hatch$DateVisit, "%m/%d/%Y")

然后将您的dplyr管道更改为:

HS_Hatch <- HS_Hatch %>%
group_by(ClutchID) %>%
summarize(first_visit = min(date_visit), 
          last_visit = max(date_visit), 
          exposure = last_visit - first_visit)

由于格式Posixct自“原点”起存储的时间以秒为单位,因此可以产生预期的结果,并且可以计算出差异。