Question

数据帧df1总结了在时间（ID）中对个人（Date）的检测。简短示例：

df1<- data.frame(ID= c(1,2,1,2,1,2,1,2,1,2),
                 Date= ymd(c("2016-08-21","2016-08-24","2016-08-23","2016-08-29","2016-08-27","2016-09-02","2016-09-01","2016-09-09","2016-09-01","2016-09-10")))

df1

   ID       Date
1   1 2016-08-21
2   2 2016-08-24
3   1 2016-08-23
4   2 2016-08-29
5   1 2016-08-27
6   2 2016-09-02
7   1 2016-09-01
8   2 2016-09-09
9   1 2016-09-01
10  2 2016-09-10

我想总结Number of days since the first detection of the individual（Ndays）和Number of days that the individual has been detected since the first time it was detected（Ndifdays）。

此外，我想在此汇总表中包含一个名为Prop的变量，该变量将Ndifdays划分为Ndays之间。

我希望的汇总表是这样：

> Result
  ID Ndays Ndifdays  Prop
1  1    11        4 0.360 # Between 21st Aug and 01st Sept there is 11 days.
2  2    17        5 0.294 # Between 24th Aug and 10st Sept there is 17 days.

有人知道怎么做吗？

Answer 1

您可以在dplyr

中使用各种汇总功能来实现

library(dplyr)

df1 %>%
   group_by(ID) %>%
   summarise(Ndays =  as.integer(max(Date) - min(Date)), 
             Ndifdays = n_distinct(Date), 
             Prop = Ndifdays/Ndays)

#     ID Ndays Ndifdays  Prop
#   <dbl> <int>    <int> <dbl>
#1     1    11        4 0.364
#2     2    17        5 0.294

此版本的data.table是

library(data.table)
df12 <- setDT(df1)[, .(Ndays = as.integer(max(Date) - min(Date)), 
                       Ndifdays = uniqueN(Date)), by = ID]
df12$Prop <- df12$Ndifdays/df12$Ndays

并以aggregate

为基数R

df12 <- aggregate(Date~ID, df1, function(x) c(max(x) - min(x), length(unique(x))))
df12$Prop <- df1$Ndifdays/df1$Ndays

Answer 2

按“ ID”分组后，获取“日期”的diff或range以创建“ Ndays”，然后使用n_distinct获得唯一的“日期”号，除以非重复数除以Ndays即可得出“道具”

library(dplyr)    
df1 %>%
   group_by(ID) %>%
   summarise(Ndays =  as.integer(diff(range(Date))), 
         Ndifdays = n_distinct(Date), 
         Prop = Ndifdays/Ndays)
# A tibble: 2 x 4
#     ID Ndays Ndifdays  Prop
#  <dbl> <int>    <int> <dbl>
#1     1    11        4 0.364
#2     2    17        5 0.294

如何根据ID和大数据框总结“自首次约会起的天数”和“看到的天数”

2 个答案: