我的赛季时间是次年10月1日至次年3月31日。我是如何为赛季创建一个虚拟变量来看到那个人进出的?
df <- data.frame(ID= c(1:6),
Drug = c("A","C","A","A","B","A"),
Start = c("01/01/2009","07/10/2010","10/10/2009","03/01/2011","03/01/2012","04/12/2010"),
End=c("09/10/2009","04/20/2011","07/20/1010","01/01/2012","04/01/2013","09/30/2011"))
我的输出:
ID Drug Start End Season
1 1 A 01/01/2009 09/10/2009 1
2 1 A 01/01/2009 09/10/2009 0
3 2 C 07/10/2010 04/20/2011 0
4 2 C 07/10/2010 04/20/2011 1
5 2 C 07/10/2010 04/20/2011 0
6 3 A 10/10/2009 07/20/1010 1
7 3 A 10/10/2009 07/20/1010 0
8 3 A 10/10/2009 07/20/1010 1
9 4 B 03/01/2011 01/01/2012 1
10 4 B 03/01/2011 01/01/2012 0
11 4 B 03/01/2011 01/01/2012 1
12 5 A 03/01/2012 04/01/2013 1
13 5 A 03/01/2012 04/01/2013 0
14 5 A 03/01/2012 04/01/2013 1
15 5 A 03/01/2012 04/01/2013 0
16 6 A 04/12/2010 09/30/2011 0
ID 1:她从01/01开始到09/10结束。
[01/01, 03/31] =1
[03/31,09/10] = 0
ID 2:她从07/10开始到04/20结束。我检查了
[07/10, 10/01] = 0
[10/01,03/31] = 1
[03/31, 04/20] = 0
ID5她于03/01开始,于04/01结束
[03/01, 03/31]= 1
[03/31, 10/01] = 0
[10/01, 03/31] = 1
[03/31, 04/01] = 0
答案 0 :(得分:1)
我认为我使用下面的代码更正了ExposedIn和ExposedOut(注意:您需要在创建数据框时添加&#39; stringsAsFactors = FALSE&#39;)。但是,我没有足够的时间来计算所涵盖的整个季节的额外总和 - 我会通过添加另一个具有日期/时间功能的列来考虑总治疗时间。
df$Start <- as.Date(df$Start, format = '%m/%d/%Y')
df$End <- as.Date(df$End, format = '%m/%d/%Y')
df$SeasonIn <- 274 # 275 in leap years
df$SeasonOut <- 90 # 91 in leap years
df$ExposedIn <- as.integer(as.POSIXlt(df$Start)$yday >= df$SeasonIn |
as.POSIXlt(df$Start)$yday < df$SeasonOut)
df$ExposedOut <- as.integer(as.POSIXlt(df$End)$yday >= df$SeasonIn |
as.POSIXlt(df$End)$yday < df$SeasonOut)
希望这至少可以帮助一些人。