我尝试在250万行上运行此代码并且我的计算机爆炸了...代码可以工作,但在大型数据集上运行缓慢。代码,输入和输出如下。 我原来从我的第一篇文章中获得了解决方案,该解决方案在下面的代码中添加了逻辑中的缺失部分 有人可以建议改进吗?
df <- read.table(header=T,text='DATE ID
DATE ID
12/31/2009 1
12/31/2010 1
12/31/2011 1
12/31/2012 1
12/31/2013 1
12/31/2011 2
12/31/2011 2
12/31/2012 2
12/31/2012 2
12/31/2013 2
12/31/2013 2
12/31/2008 3
12/31/2009 3
12/31/2009 3
12/31/2009 3
12/31/2010 3
12/31/2010 3
12/31/2010 3
12/31/2011 3
12/31/2011 3
12/31/2012 3
12/31/2008 4
12/31/2008 4
12/31/2009 4
12/31/2009 4
12/31/2010 4
12/31/2010 4
12/31/2011 4
12/31/2011 4
12/31/2011 4
12/31/2011 4
12/31/2011 4
12/31/2012 4
12/31/2012 4
12/31/2012 4
12/31/2012 4
12/31/2012 4
12/31/2012 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2012 5
12/31/2009 5
12/31/2010 5
12/31/2011 5
12/31/2011 5
12/31/2012 5
12/31/2012 5
12/31/2013 5')
df$DATE <- as.Date(df$DATE,"%m/%d/%Y")
split.rows <- split.default(1:nrow(df),trim(df$ID),drop=T)
lapply(split.rows,function(x){
split_df <- df[x,]
group <- vector('integer',length(x))
group_date <- vector('character',length(x))
group[1] <- 1
group_date[1] <- as.character(split_df[1,'DATE'])
for (i in 2:nrow(split_df)){
if (split_df[i,'DATE'] - split_df[i-1,'DATE'] > 365 || split_df[i,'DATE'] - as.Date(group_date[i-1])> 365){
group[i] <- group[i - 1] + 1
group_date[i] <- as.character(split_df[i,'DATE'])
}
else{
group[i] <- group[i - 1]
group_date[i] <- group_date[i-1]
}
}
df$GROUP[x] <<- group
df$GROUPDATE[x] <<- group_date
return(NULL)
})
df[,]
=================
Input:
DATE ID
12/31/2009 1
12/31/2010 1
12/31/2011 1
12/31/2012 1
12/31/2013 1
12/31/2011 2
12/31/2011 2
12/31/2012 2
12/31/2012 2
12/31/2013 2
12/31/2013 2
12/31/2008 3
12/31/2009 3
12/31/2009 3
12/31/2009 3
12/31/2010 3
12/31/2010 3
12/31/2010 3
12/31/2011 3
12/31/2011 3
12/31/2012 3
12/31/2008 4
12/31/2008 4
12/31/2009 4
12/31/2009 4
12/31/2010 4
12/31/2010 4
12/31/2011 4
12/31/2011 4
12/31/2011 4
12/31/2011 4
12/31/2011 4
12/31/2012 4
12/31/2012 4
12/31/2012 4
12/31/2012 4
12/31/2012 4
12/31/2012 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2013 4
12/31/2012 5
12/31/2009 5
12/31/2010 5
12/31/2011 5
12/31/2011 5
12/31/2012 5
12/31/2012 5
12/31/2013 5
output
DATE ID group groupdate
12/31/2009 1 1 31-Dec-09
12/31/2010 1 1 31-Dec-09
12/31/2011 1 2 31-Dec-11
12/31/2012 1 3 31-Dec-12
12/31/2013 1 3 31-Dec-12
12/31/2011 2 1 31-Dec-11
12/31/2011 2 1 31-Dec-11
12/31/2012 2 2 31-Dec-12
12/31/2012 2 2 31-Dec-12
12/31/2013 2 2 31-Dec-12
12/31/2013 2 2 31-Dec-12
12/31/2008 3 1 31-Dec-08
12/31/2009 3 1 31-Dec-08
12/31/2009 3 1 31-Dec-08
12/31/2009 3 1 31-Dec-08
12/31/2010 3 2 31-Dec-10
12/31/2010 3 2 31-Dec-10
12/31/2010 3 2 31-Dec-10
12/31/2011 3 2 31-Dec-10
12/31/2011 3 2 31-Dec-10
12/31/2012 3 3 31-Dec-12
12/31/2008 4 1 31-Dec-08
12/31/2008 4 1 31-Dec-08
12/31/2009 4 1 31-Dec-08
12/31/2009 4 1 31-Dec-08
12/31/2010 4 2 31-Dec-10
12/31/2010 4 2 31-Dec-10
12/31/2011 4 2 31-Dec-10
12/31/2011 4 2 31-Dec-10
12/31/2011 4 2 31-Dec-10
12/31/2011 4 2 31-Dec-10
12/31/2011 4 2 31-Dec-10
12/31/2012 4 3 31-Dec-12
12/31/2012 4 3 31-Dec-12
12/31/2012 4 3 31-Dec-12
12/31/2012 4 3 31-Dec-12
12/31/2012 4 3 31-Dec-12
12/31/2012 4 3 31-Dec-12
12/31/2013 4 3 31-Dec-12
12/31/2013 4 3 31-Dec-12
12/31/2013 4 3 31-Dec-12
12/31/2013 4 3 31-Dec-12
12/31/2013 4 3 31-Dec-12
12/31/2013 4 3 31-Dec-12
12/31/2013 4 3 31-Dec-12
12/31/2013 4 3 31-Dec-12
12/31/2013 4 3 31-Dec-12
12/31/2012 5 1 31-Dec-12
12/31/2009 5 1 31-Dec-09
12/31/2010 5 1 31-Dec-09
12/31/2011 5 2 31-Dec-11
12/31/2011 5 2 31-Dec-11
12/31/2012 5 3 31-Dec-12
12/31/2012 5 3 31-Dec-12
12/31/2013 5 3 31-Dec-12