如何获得下面的代码来处理250万行而不会爆炸

时间:2016-04-23 04:46:08

标签: r data.table

我尝试在250万行上运行此代码并且我的计算机爆炸了...代码可以工作,但在大型数据集上运行缓慢。代码,输入和输出如下。 我原来从我的第一篇文章中获得了解决方案,该解决方案在下面的代码中添加了逻辑中的缺失部分 有人可以建议改进吗?

感谢您的帮助!

    df <- read.table(header=T,text='DATE    ID
    DATE    ID
12/31/2009  1
12/31/2010  1
12/31/2011  1
12/31/2012  1
12/31/2013  1
12/31/2011  2
12/31/2011  2
12/31/2012  2
12/31/2012  2
12/31/2013  2
12/31/2013  2
12/31/2008  3
12/31/2009  3
12/31/2009  3
12/31/2009  3
12/31/2010  3
12/31/2010  3
12/31/2010  3
12/31/2011  3
12/31/2011  3
12/31/2012  3
12/31/2008  4
12/31/2008  4
12/31/2009  4
12/31/2009  4
12/31/2010  4
12/31/2010  4
12/31/2011  4
12/31/2011  4
12/31/2011  4
12/31/2011  4
12/31/2011  4
12/31/2012  4
12/31/2012  4
12/31/2012  4
12/31/2012  4
12/31/2012  4
12/31/2012  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2012  5
12/31/2009  5
12/31/2010  5
12/31/2011  5
12/31/2011  5
12/31/2012  5
12/31/2012  5
12/31/2013  5')
df$DATE <- as.Date(df$DATE,"%m/%d/%Y")
split.rows <- split.default(1:nrow(df),trim(df$ID),drop=T)

lapply(split.rows,function(x){
split_df <- df[x,]

group <- vector('integer',length(x))
group_date <- vector('character',length(x))

group[1] <- 1
group_date[1] <- as.character(split_df[1,'DATE'])

for (i in 2:nrow(split_df)){
if (split_df[i,'DATE'] - split_df[i-1,'DATE'] > 365 || split_df[i,'DATE'] - as.Date(group_date[i-1])> 365){
group[i] <- group[i - 1] + 1
group_date[i] <- as.character(split_df[i,'DATE'])
  }
else{
group[i] <- group[i - 1]
group_date[i] <- group_date[i-1]
  }
}

df$GROUP[x] <<- group
df$GROUPDATE[x] <<- group_date

return(NULL)
})

df[,]

=================
Input:
DATE    ID
12/31/2009  1
12/31/2010  1
12/31/2011  1
12/31/2012  1
12/31/2013  1
12/31/2011  2
12/31/2011  2
12/31/2012  2
12/31/2012  2
12/31/2013  2
12/31/2013  2
12/31/2008  3
12/31/2009  3
12/31/2009  3
12/31/2009  3
12/31/2010  3
12/31/2010  3
12/31/2010  3
12/31/2011  3
12/31/2011  3
12/31/2012  3
12/31/2008  4
12/31/2008  4
12/31/2009  4
12/31/2009  4
12/31/2010  4
12/31/2010  4
12/31/2011  4
12/31/2011  4
12/31/2011  4
12/31/2011  4
12/31/2011  4
12/31/2012  4
12/31/2012  4
12/31/2012  4
12/31/2012  4
12/31/2012  4
12/31/2012  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2013  4
12/31/2012  5
12/31/2009  5
12/31/2010  5
12/31/2011  5
12/31/2011  5
12/31/2012  5
12/31/2012  5
12/31/2013  5


output
DATE    ID  group   groupdate
12/31/2009  1   1   31-Dec-09
12/31/2010  1   1   31-Dec-09
12/31/2011  1   2   31-Dec-11
12/31/2012  1   3   31-Dec-12
12/31/2013  1   3   31-Dec-12
12/31/2011  2   1   31-Dec-11
12/31/2011  2   1   31-Dec-11
12/31/2012  2   2   31-Dec-12
12/31/2012  2   2   31-Dec-12
12/31/2013  2   2   31-Dec-12
12/31/2013  2   2   31-Dec-12
12/31/2008  3   1   31-Dec-08
12/31/2009  3   1   31-Dec-08
12/31/2009  3   1   31-Dec-08
12/31/2009  3   1   31-Dec-08
12/31/2010  3   2   31-Dec-10
12/31/2010  3   2   31-Dec-10
12/31/2010  3   2   31-Dec-10
12/31/2011  3   2   31-Dec-10
12/31/2011  3   2   31-Dec-10
12/31/2012  3   3   31-Dec-12
12/31/2008  4   1   31-Dec-08
12/31/2008  4   1   31-Dec-08
12/31/2009  4   1   31-Dec-08
12/31/2009  4   1   31-Dec-08
12/31/2010  4   2   31-Dec-10
12/31/2010  4   2   31-Dec-10
12/31/2011  4   2   31-Dec-10
12/31/2011  4   2   31-Dec-10
12/31/2011  4   2   31-Dec-10
12/31/2011  4   2   31-Dec-10
12/31/2011  4   2   31-Dec-10
12/31/2012  4   3   31-Dec-12
12/31/2012  4   3   31-Dec-12
12/31/2012  4   3   31-Dec-12
12/31/2012  4   3   31-Dec-12
12/31/2012  4   3   31-Dec-12
12/31/2012  4   3   31-Dec-12
12/31/2013  4   3   31-Dec-12
12/31/2013  4   3   31-Dec-12
12/31/2013  4   3   31-Dec-12
12/31/2013  4   3   31-Dec-12
12/31/2013  4   3   31-Dec-12
12/31/2013  4   3   31-Dec-12
12/31/2013  4   3   31-Dec-12
12/31/2013  4   3   31-Dec-12
12/31/2013  4   3   31-Dec-12
12/31/2012  5   1   31-Dec-12
12/31/2009  5   1   31-Dec-09
12/31/2010  5   1   31-Dec-09
12/31/2011  5   2   31-Dec-11
12/31/2011  5   2   31-Dec-11
12/31/2012  5   3   31-Dec-12
12/31/2012  5   3   31-Dec-12
12/31/2013  5   3   31-Dec-12

0 个答案:

没有答案