我需要从这个
开始 id | date
-----------------
A | 2000-01-13
A | 2000-01-18
A | 2000-01-25
B | 2012-10-10
B | 2012-10-11
C | 2005-07-25
C | 2005-07-31
到这个
id | date | days from start
---------------------------
A | 2000-01-13 | 0
A | 2000-01-18 | 5
A | 2000-01-25 | 12
A | 2000-02-08 | 26
B | 2012-10-10 | 0
B | 2012-10-11 | 1
C | 2005-07-25 | 0
C | 2005-07-31 | 6
即。创建一个变量,该变量包含自第一个日期以来经过的天数,按ID分组。
有什么想法吗?
答案 0 :(得分:10)
使用data.table
:(我假设date
列是此处的字符。如果是date
格式,则可以删除as.Date(.)
函数调用。
df <- structure(list(id = c("A", "A", "A", "B", "B", "C", "C"),
date = c("2000-01-13", "2000-01-18", "2000-01-25", "2012-10-10",
"2012-10-11", "2005-07-25", "2005-07-31")),
.Names = c("id", "date"), row.names = c(NA, -7L),
class = "data.frame")
require(data.table)
dt <- data.table(df, key="id")
dt[, days_from_start := cumsum(c(0, diff(as.Date(date)))),by=id]
# id date days_from_start
# 1: A 2000-01-13 0
# 2: A 2000-01-18 5
# 3: A 2000-01-25 12
# 4: B 2012-10-10 0
# 5: B 2012-10-11 1
# 6: C 2005-07-25 0
# 7: C 2005-07-31 6
答案 1 :(得分:5)
您还可以使用功能组合difftime
和split
:
dat
id date
1 A 2000-01-13
2 A 2000-01-18
3 A 2000-01-25
4 B 2012-10-10
5 B 2012-10-11
6 C 2005-07-25
7 C 2005-07-31
dat$date <- as.POSIXct(dat$date)
dat$"Days spent" <- unlist(lapply(split(dat,f=dat$id),
function(x){as.numeric(difftime(x$date,x$date[1], units="days"))}))
dat
id date Days spent
1 A 2000-01-13 0
2 A 2000-01-18 5
3 A 2000-01-25 12
4 B 2012-10-10 0
5 B 2012-10-11 1
6 C 2005-07-25 0
7 C 2005-07-31 6
根据@agstudy和@Arun的建议,可以简化如下:
dat$"Days spent" <- unlist(by(dat, dat$id,
function(x)difftime(x$date,x$date[1], units= "days")))
答案 2 :(得分:0)
另外两种方法:ave
并使用plyr
库:
df <-
structure(list(id = c("A", "A", "A", "B", "B", "C", "C"), date = structure(c(10969,
10974, 10981, 15623, 15624, 12989, 12995), class = "Date")), .Names = c("id",
"date"), row.names = c(NA, -7L), class = "data.frame")
使用ave
,日期必须更改为数字
df$days_from_start <- ave(as.numeric(df$date), df$id, FUN = function(x) x-min(x))
给出了
> df
id date days_from_start
1 A 2000-01-13 0
2 A 2000-01-18 5
3 A 2000-01-25 12
4 B 2012-10-10 0
5 B 2012-10-11 1
6 C 2005-07-25 0
7 C 2005-07-31 6
> str(df)
'data.frame': 7 obs. of 3 variables:
$ id : chr "A" "A" "A" "B" ...
$ date : Date, format: "2000-01-13" ...
$ days_from_start: num 0 5 12 0 1 0 6
使用plyr
库:
library("plyr")
df <- ddply(df, .(id), mutate, days_from_start = date - min(date))
给出了
> df
id date days_from_start
1 A 2000-01-13 0 days
2 A 2000-01-18 5 days
3 A 2000-01-25 12 days
4 B 2012-10-10 0 days
5 B 2012-10-11 1 days
6 C 2005-07-25 0 days
7 C 2005-07-31 6 days
> str(df)
'data.frame': 7 obs. of 3 variables:
$ id : chr "A" "A" "A" "B" ...
$ date : Date, format: "2000-01-13" ...
$ days_from_start:Class 'difftime' atomic [1:7] 0 5 12 0 1 0 6
.. ..- attr(*, "units")= chr "days"