我想为我的数据绘制生命线图,以便读者可以了解数据的形成方式以及正确的审查对数据的影响。
理想情况下,我希望它看起来像[this] [1]
我需要为每个参与者提供一条水平线,从观察日期开始到我们观察他的最后一天结束。观察最后一天的人应该使用不同的颜色(或者有其他指示)。
数据如下所示:
regdate lastlogindate censor duration
2010-02-24 02:30:43 2010-05-27 07:58:17 0 92
2007-12-23 11:16:37 2008-03-07 10:36:29 1 75
2009-01-19 04:23:28 2009-01-24 06:33:38 1 5
2010-07-25 10:24:39 2010-08-11 07:13:25 0 17
2009-08-23 07:18:06 2009-08-24 06:25:35 1 1
2007-08-12 07:24:55 2010-06-01 06:53:57 0 1024
加州大学洛杉矶分校how its done in Stata。我告诉我的顾问我可以匹配他在R的Stata所做的任何事情。我需要一些帮助。)
编辑:我终于成功了。以下是带有dput的数据示例。
structure(list(users_id = c(1747516, 913136, 921278, 1654913,
782364, 1371798, 1174461, 1493894, 1124186, 1249310),
regdate = c("2010-08-15 05:50:09", "2009-01-04 13:47:46", "2009-01-07 13:34:53", "2010-06-30 11:19:08", "2008-08-13 06:46:28", "2010-01-26 12:58:20", "2009-08-18 15:13:12", "2010-04-04 11:33:47", "2009-07-10 12:33:41", "2009-10-19 13:30:49" ),
lastlogindate = c("2010-09-01 05:51:34", "2010-09-17 05:25:00", "2009-05-15 07:55:30", "2010-07-02 07:34:02", "2008-10-25 14:29:50", "2010-03-17 05:04:58", "2010-07-06 03:48:48", "2010-04-09 19:44:42", "2010-09-03 04:18:18", "2009-10-20 06:26:55"),
censor6 = c(0, 0, 1, 0, 1, 1, 0, 0, 0, 1)),
.Names = c("users_id", "regdate", "lastlogindate", "censor6"),
row.names = c(1L, 2L, 4L, 5L, 7L, 9L, 10L, 11L, 12L, 14L),
class = "data.frame")
我做的是用reshape2包融化数据,以便每次观察有两行。开始和结束日期。然后我用合并添加了审查变量。
# Create a subset of the data with 25 observations
sampData1<-data[c("users_id", "regdate", "lastlogindate")]
sampData1<-sampData1[sample(1:nrow(sampData1),25),]
# Create two entries for each observation 1 for start date 1 for end
sampData1<-melt(sampData1, id.vars="users_id")
sampData1<-sampData1[order(sampData1$users_id, sampData1$value),]
# Add a grouping variable basically the same thing as user ID but looks better on plot
sampData1$ID<-rep(seq(1,nrow(sampData1)/2,1), each=2)
# Put back the censoring variable
sampData1<-merge(sampData1, data[,c("users_id", "censor6")])
sampData1$censor6<-as.factor(sampData1$censor6)
sampData1$value<-as.POSIXct(sampData1$value, origin="1970-01-01 00:00:00")
现在让我们创建一个情节
# Base Plot
gp<-ggplot(sampData1)
# Add the horizontal lines (This is the big deal)
gp+geom_line(aes(value, ID, group=ID, color=censor6, size=1))
# Decluter the x axis labels
gp+scale_x_datetime(breaks=date_breaks('3 month'))
# rotate x axis labels
gp+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Change the legend label and colors
gp+scale_color_manual(values = c("red", "blue"))