我正在尝试生成变量diff
来表示两个连续事件之间的天数。
library(outbreaks)
df <- measles_hagelloch_1861[order(measles_hagelloch_1861$date_of_prodrome), ]
library(lubridate)
library(plyr)
# First date
firs_date <- min(df$date_of_prodrome)
# Commulative number of days
df$cum_number_day <- difftime(df$date_of_prodrome, firs_date, units = 'days')
head(df$cum_number_day)
# The number of days between two consecutive events
df$diff <- difftime(lag(df$date_of_prodrome, 1), df$date_of_prodrome, units = "days" )
head(df$diff)
和结果
Time differences in days
[1] 0 2 8 8 9 12
Time differences in days
[1] 0 0 0 0 0 0
您能解释一下为什么我在第一个命令中得到了期望的结果,而在第二个命令中却得到了0 0 0
吗?
答案 0 :(得分:1)
问题是您的代码使用stats::lag
而不是dplyr::lag
。看到不同之处:
df$diff <- difftime(stats::lag(df$date_of_prodrome, 1),df$date_of_prodrome, units = "days" )
head(df$diff)
#Time differences in days
#[1] 0 0 0 0 0 0
df$diff <- difftime(dplyr::lag(df$date_of_prodrome, 1), df$date_of_prodrome, units = "days" )
head(df$diff)
#Time differences in days
#[1] NA -2 -6 0 -1 -3