假设我有一个数据框:
User Date
aaaa 2015-11-26
aaaa 2015-12-26
aaaa 2016-01-26
bbbb 2014-10-15
bbbb 2014-11-15
bbbb 2015-05-16
我想生成一个新的列变量:
所需的输出:
User Date Count Gap
aaaa 2015-11-26 1 0
aaaa 2015-12-26 2 0
aaaa 2016-01-26 3 0
bbbb 2014-10-15 1 0
bbbb 2014-11-15 2 0
bbbb 2015-05-16 3 6
答案 0 :(得分:1)
然而,使用zoo::as.yearmon()
时,我必须round
,否则2015-11-26
到2015-12-26
被认为更长超过一个月。也许有人可以评论/编辑/解释如何使该特定计算更“直观”。
library(dplyr)
library(zoo)
df %>%
group_by(User) %>%
mutate(Count = 1:n(),
Gap_In_Months = round(12 * as.numeric(as.yearmon(Date) - as.yearmon(lag(Date))), 1),
Gap = ifelse(Gap_In_Months <= 1 | is.na(Gap_In_Months), 0, Gap_In_Months))
# User Date Count Gap_In_Months Gap
# (fctr) (fctr) (int) (dbl) (dbl)
# 1 aaaa 2015-11-26 1 NA 0
# 2 aaaa 2015-12-26 2 1 0
# 3 aaaa 2016-01-26 3 1 0
# 4 bbbb 2014-10-15 1 NA 0
# 5 bbbb 2014-11-15 2 1 0
# 6 bbbb 2015-05-16 3 6 6
也许你想更具体地说“一个月是什么时候”? 30天? 31天? 28天?
如果是这种情况,我们可以使用lubrdiate
:
library(lubridate)
df %>%
group_by(User) %>%
mutate(Count = 1:n(),
Diff_Time = ymd(Date) - ymd(lag(Date)),
Gap = ifelse(Diff_Time <= ddays(31) | is.na(Diff_Time), 0, as.numeric(Diff_Time, units = "days")))
# User Date Count Diff_Time Gap
# (fctr) (fctr) (int) (dfft) (dbl)
# 1 aaaa 2015-11-26 1 NA days 0
# 2 aaaa 2015-12-26 2 30 days 0
# 3 aaaa 2016-01-26 3 31 days 0
# 4 bbbb 2014-10-15 1 NA days 0
# 5 bbbb 2014-11-15 2 31 days 0
# 6 bbbb 2015-05-16 3 182 days 182