如何根据ID和大数据框总结“自首次约会起的天数”和“看到的天数”

时间:2019-07-08 09:19:28

标签: r tidyverse

数据帧df1总结了在时间(ID)中对个人(Date)的检测。简短示例:

df1<- data.frame(ID= c(1,2,1,2,1,2,1,2,1,2),
                 Date= ymd(c("2016-08-21","2016-08-24","2016-08-23","2016-08-29","2016-08-27","2016-09-02","2016-09-01","2016-09-09","2016-09-01","2016-09-10")))

df1

   ID       Date
1   1 2016-08-21
2   2 2016-08-24
3   1 2016-08-23
4   2 2016-08-29
5   1 2016-08-27
6   2 2016-09-02
7   1 2016-09-01
8   2 2016-09-09
9   1 2016-09-01
10  2 2016-09-10

我想总结Number of days since the first detection of the individualNdays)和Number of days that the individual has been detected since the first time it was detectedNdifdays)。

此外,我想在此汇总表中包含一个名为Prop的变量,该变量将Ndifdays划分为Ndays之间。

我希望的汇总表是这样:

> Result
  ID Ndays Ndifdays  Prop
1  1    11        4 0.360 # Between 21st Aug and 01st Sept there is 11 days.
2  2    17        5 0.294 # Between 24th Aug and 10st Sept there is 17 days.

有人知道怎么做吗?

2 个答案:

答案 0 :(得分:1)

您可以在dplyr

中使用各种汇总功能来实现
library(dplyr)

df1 %>%
   group_by(ID) %>%
   summarise(Ndays =  as.integer(max(Date) - min(Date)), 
             Ndifdays = n_distinct(Date), 
             Prop = Ndifdays/Ndays)

#     ID Ndays Ndifdays  Prop
#   <dbl> <int>    <int> <dbl>
#1     1    11        4 0.364
#2     2    17        5 0.294

此版本的data.table

library(data.table)
df12 <- setDT(df1)[, .(Ndays = as.integer(max(Date) - min(Date)), 
                       Ndifdays = uniqueN(Date)), by = ID]
df12$Prop <- df12$Ndifdays/df12$Ndays

并以aggregate

为基数R
df12 <- aggregate(Date~ID, df1, function(x) c(max(x) - min(x), length(unique(x))))
df12$Prop <- df1$Ndifdays/df1$Ndays

答案 1 :(得分:0)

按“ ID”分组后,获取“日期”的diffrange以创建“ Ndays”,然后使用n_distinct获得唯一的“日期”号,除以非重复数除以Ndays即可得出“道具”

library(dplyr)    
df1 %>%
   group_by(ID) %>%
   summarise(Ndays =  as.integer(diff(range(Date))), 
         Ndifdays = n_distinct(Date), 
         Prop = Ndifdays/Ndays)
# A tibble: 2 x 4
#     ID Ndays Ndifdays  Prop
#  <dbl> <int>    <int> <dbl>
#1     1    11        4 0.364
#2     2    17        5 0.294