我有以下数据框,
id, date, state
1 2012-01-01 a
1 2012-01-02 a
1 2012-01-03 a
1 2012-01-04 b
1 2012-01-05 b
2 2013-01-01 a
2 2013-01-02 a
2 2013-01-03 b
2 2013-01-04 b
对于每个id,我想找到状态从a更改为b的日期,之后我希望它作为该id的列插入。所以上面的例子会产生
id, date, state, changedate
1 2012-01-01 a 2012-01-03
1 2012-01-02 a 2012-01-03
1 2012-01-03 a 2012-01-03
1 2012-01-04 b 2012-01-03
1 2012-01-05 b 2012-01-03
2 2013-01-01 a 2013-01-02
2 2013-01-02 a 2013-01-02
2 2013-01-03 b 2013-01-02
2 2013-01-04 b 2013-01-02
有没有办法通过plyr函数或甚至基础R优雅地完成这项工作? 提前谢谢。
答案 0 :(得分:2)
编辑:正如塞巴斯蒂安所提到的,我假设data.frame按列date
排序。
众多解决方案之一。可能棘手的一点是找到过渡期。这可以在rle
的帮助下完成。
rle.df <- rle(df$state)
# get indices of a-to-b transition -> 3,7
idx <- cumsum(rle.df$lengths)[c(TRUE, FALSE)]
# get indices of b-to-a transition -> 5,9
idx2 <- cumsum(rle.df$lengths)[c(FALSE, TRUE)]
# construct appropriate lengths -> 5,4
idx2 <- c(idx2[1], diff(idx2))
# do a rep with idx2 fro times and df$date[idx] for value
df$changedate <- unlist(lapply(1:length(idx2), function(vv) {
rep(df$date[idx[vv]], idx2[vv])
}))
> df
id. date. state changedate
1 1 2012-01-01 a 2012-01-03
2 1 2012-01-02 a 2012-01-03
3 1 2012-01-03 a 2012-01-03
4 1 2012-01-04 b 2012-01-03
5 1 2012-01-05 b 2012-01-03
6 2 2013-01-01 a 2013-01-02
7 2 2013-01-02 a 2013-01-02
8 2 2013-01-03 b 2013-01-02
9 2 2013-01-04 b 2013-01-02
使用data.table
的替代解决方案(我刚刚注意到您还有一个.id.
列,我们可以通过rle
分割并将日期与转换索引一起应用。
require(data.table)
rle.df <- rle(df$state)
idx <- cumsum(rle.df$lengths)[c(TRUE, FALSE)]
idx2 <- cumsum(rle.df$lengths)[c(FALSE, TRUE)]
idx <- c(idx[1], tail(idx, -1) - head(idx2, -1))
dt <- data.table(df, key="id.")
out <- dt[, `:=`(changedate=date.[idx[id.]]), by=id.]
> out
id. date. state changedate
1: 1 2012-01-01 a 2012-01-03
2: 1 2012-01-02 a 2012-01-03
3: 1 2012-01-03 a 2012-01-03
4: 1 2012-01-04 b 2012-01-03
5: 1 2012-01-05 b 2012-01-03
6: 2 2013-01-01 a 2013-01-02
7: 2 2013-01-02 a 2013-01-02
8: 2 2013-01-03 b 2013-01-02
9: 2 2013-01-04 b 2013-01-02