如何计算从开始日期开始按组计算的天数?

时间:2013-01-21 10:17:13

标签: r aggregate

我需要从这个

开始
 id  |    date
-----------------
  A  | 2000-01-13
  A  | 2000-01-18
  A  | 2000-01-25
  B  | 2012-10-10
  B  | 2012-10-11
  C  | 2005-07-25
  C  | 2005-07-31

到这个

 id  |    date     | days from start
---------------------------
  A  | 2000-01-13  |  0
  A  | 2000-01-18  |  5
  A  | 2000-01-25  |  12
  A  | 2000-02-08  |  26
  B  | 2012-10-10  |  0
  B  | 2012-10-11  |  1
  C  | 2005-07-25  |  0
  C  | 2005-07-31  |  6

即。创建一个变量,该变量包含自第一个日期以来经过的天数,按ID分组。

有什么想法吗?

3 个答案:

答案 0 :(得分:10)

使用data.table :(我假设date列是此处的字符。如果是date格式,则可以删除as.Date(.)函数调用。

df <- structure(list(id = c("A", "A", "A", "B", "B", "C", "C"), 
             date = c("2000-01-13", "2000-01-18", "2000-01-25", "2012-10-10", 
                    "2012-10-11", "2005-07-25", "2005-07-31")), 
             .Names = c("id", "date"), row.names = c(NA, -7L), 
             class = "data.frame")
require(data.table)
dt <- data.table(df, key="id")
dt[, days_from_start := cumsum(c(0, diff(as.Date(date)))),by=id]

#    id       date days_from_start
# 1:  A 2000-01-13               0
# 2:  A 2000-01-18               5
# 3:  A 2000-01-25              12
# 4:  B 2012-10-10               0
# 5:  B 2012-10-11               1
# 6:  C 2005-07-25               0
# 7:  C 2005-07-31               6

答案 1 :(得分:5)

您还可以使用功能组合difftimesplit

dat
  id       date
1  A 2000-01-13
2  A 2000-01-18
3  A 2000-01-25
4  B 2012-10-10
5  B 2012-10-11
6  C 2005-07-25
7  C 2005-07-31

dat$date <- as.POSIXct(dat$date)
dat$"Days spent" <- unlist(lapply(split(dat,f=dat$id),
                         function(x){as.numeric(difftime(x$date,x$date[1], units="days"))}))
dat
  id       date Days spent
1  A 2000-01-13          0
2  A 2000-01-18          5
3  A 2000-01-25         12
4  B 2012-10-10          0
5  B 2012-10-11          1
6  C 2005-07-25          0
7  C 2005-07-31          6

根据@agstudy和@Arun的建议,可以简化如下:

dat$"Days spent" <- unlist(by(dat, dat$id, 
                           function(x)difftime(x$date,x$date[1], units= "days")))

答案 2 :(得分:0)

另外两种方法:ave并使用plyr库:

df <-
structure(list(id = c("A", "A", "A", "B", "B", "C", "C"), date = structure(c(10969, 
10974, 10981, 15623, 15624, 12989, 12995), class = "Date")), .Names = c("id", 
"date"), row.names = c(NA, -7L), class = "data.frame")

使用ave,日期必须更改为数字

df$days_from_start <- ave(as.numeric(df$date), df$id, FUN = function(x) x-min(x))

给出了

> df
  id       date days_from_start
1  A 2000-01-13               0
2  A 2000-01-18               5
3  A 2000-01-25              12
4  B 2012-10-10               0
5  B 2012-10-11               1
6  C 2005-07-25               0
7  C 2005-07-31               6
> str(df)
'data.frame':   7 obs. of  3 variables:
 $ id             : chr  "A" "A" "A" "B" ...
 $ date           : Date, format: "2000-01-13" ...
 $ days_from_start: num  0 5 12 0 1 0 6

使用plyr库:

library("plyr")
df <- ddply(df, .(id), mutate, days_from_start = date - min(date))

给出了

> df
  id       date days_from_start
1  A 2000-01-13          0 days
2  A 2000-01-18          5 days
3  A 2000-01-25         12 days
4  B 2012-10-10          0 days
5  B 2012-10-11          1 days
6  C 2005-07-25          0 days
7  C 2005-07-31          6 days
> str(df)
'data.frame':   7 obs. of  3 variables:
 $ id             : chr  "A" "A" "A" "B" ...
 $ date           : Date, format: "2000-01-13" ...
 $ days_from_start:Class 'difftime'  atomic [1:7] 0 5 12 0 1 0 6
  .. ..- attr(*, "units")= chr "days"