R - 使用while循环消除重复值

时间:2016-11-30 13:34:02

标签: r duplicates time-series

我有一个时间序列数据集,其中初始观察来自月度数据。我将日期转换为每日,并将每个值放在月初。现在,我想为每个重复值添加一天,直到数据集中没有重叠日期。此步骤对于后续分析和绘图至关重要。

这是为了生成一个类似于我的数据集:

sample <- rbind("2007-01-01","2007-02-01","2007-03-01","2007-05-01",
           "2007-06-01","2007-07-01","2007-09-01","2007-10-01",
           "2007-11-01","2007-12-01","2008-01-01","2008-02-01",
           "2008-03-01","2008-05-01","2008-06-01","2008-07-01",
           "2008-09-01","2008-10-01","2008-11-01","2008-12-01",
           "2009-02-01","2009-04-01","2009-05-01","2009-06-01",
           "2009-07-01","2009-09-01","2009-10-01","2009-11-01",
           "2009-12-01","2010-01-01","2010-02-01","2010-03-01",
           "2010-04-01","2010-05-01","2010-05-01","2010-05-01",
           "2010-05-01","2010-05-01","2010-06-01","2010-06-01",
           "2010-06-01","2010-06-01","2010-07-01","2010-07-01",
           "2010-07-01","2010-07-01","2010-07-01","2010-08-01",
           "2010-08-01","2010-08-01","2010-08-01","2010-09-01",
           "2010-09-01","2010-09-01","2010-09-01","2010-09-01",
           "2010-10-01","2010-10-01","2010-10-01","2010-10-01",
           "2010-10-01","2010-11-01","2010-11-01","2010-11-01",
           "2010-11-01","2010-11-01","2010-12-01","2010-12-01",
           "2010-12-01","2010-12-01","2010-12-01","2011-01-01",
           "2011-01-01","2011-01-01","2011-01-01","2011-02-01",
           "2011-02-01","2011-02-01","2011-02-01","2011-03-01",
           "2011-03-01","2011-03-01","2011-03-01","2011-04-01",
           "2011-04-01","2011-04-01","2011-04-01","2011-04-01",
           "2011-05-01","2011-05-01","2011-05-01","2011-05-01",
           "2011-05-01","2011-06-01","2011-06-01","2011-06-01",
           "2011-06-01","2011-06-01","2011-07-01","2011-07-01",
           "2011-07-01","2011-07-01","2011-08-01","2011-08-01",
           "2011-08-01","2011-09-01","2011-09-01","2011-09-01",
           "2011-09-01","2011-10-01","2011-10-01","2011-10-01",
           "2011-10-01","2011-10-01","2011-11-01","2011-11-01",
           "2011-11-01","2011-11-01","2011-11-01","2011-12-01",
           "2011-12-01","2011-12-01","2011-12-01","2011-12-01",
           "2012-01-01","2012-01-01","2012-01-01","2012-01-01",
           "2012-01-01","2012-02-01","2012-02-01","2012-02-01",
           "2012-02-01","2012-02-01","2012-03-01","2012-03-01",
           "2012-03-01","2012-03-01","2012-03-01","2012-04-01",
           "2012-04-01","2012-04-01","2012-04-01","2012-05-01",
           "2012-05-01","2012-05-01","2012-05-01","2012-05-01",
           "2012-06-01","2012-06-01","2012-06-01","2012-06-01",
           "2012-06-01","2012-07-01","2012-07-01","2012-07-01",
           "2012-07-01","2012-07-01","2012-08-01","2012-08-01",
           "2012-08-01","2012-09-01","2012-09-01","2012-09-01",
           "2012-09-01","2012-09-01","2012-10-01","2012-10-01",
           "2012-10-01","2012-10-01","2012-10-01","2012-11-01",
           "2012-11-01","2012-11-01","2012-11-01","2012-11-01",
           "2012-12-01","2012-12-01","2012-12-01","2013-01-01",
           "2013-01-01","2013-01-01","2013-01-01","2013-01-01",
           "2013-02-01","2013-02-01","2013-02-01","2013-02-01",
           "2013-02-01","2013-03-01","2013-03-01","2013-03-01",
           "2013-03-01","2013-03-01","2013-04-01","2013-04-01",
           "2013-04-01","2013-04-01","2013-04-01","2013-05-01",
           "2013-05-01","2013-05-01","2013-05-01","2013-05-01",
           "2013-06-01","2013-06-01","2013-06-01","2013-06-01",
           "2013-07-01","2013-07-01","2013-07-01","2013-07-01",
           "2013-08-01","2013-08-01","2013-08-01","2013-09-01",
           "2013-09-01","2013-09-01","2013-09-01","2013-09-01",
           "2013-10-01","2013-10-01","2013-10-01","2013-10-01",
           "2013-10-01","2013-11-01","2013-11-01","2013-11-01",
           "2013-11-01","2013-11-01","2013-12-01","2013-12-01",
           "2013-12-01","2013-12-01","2013-12-01","2014-01-01",
           "2014-01-01","2014-01-01","2014-01-01","2014-01-01",
           "2014-02-01","2014-02-01","2014-02-01","2014-02-01",
           "2014-02-01","2014-03-01","2014-03-01","2014-03-01",
           "2014-03-01","2014-03-01","2014-05-01","2014-05-01",
           "2014-05-01","2014-05-01","2014-05-01","2014-06-01",
           "2014-06-01","2014-06-01","2014-07-01","2014-07-01",
           "2014-07-01","2014-07-01","2014-08-01","2014-08-01",
           "2014-09-01","2014-09-01","2014-09-01","2014-09-01",
           "2014-09-01","2014-10-01","2014-10-01","2014-10-01",
           "2014-10-01","2014-11-01","2014-11-01","2014-11-01",
           "2014-11-01","2014-12-01","2014-12-01","2014-12-01",
           "2015-01-01","2015-01-01","2015-01-01","2015-01-01",
           "2015-02-01","2015-02-01","2015-02-01","2015-02-01",
           "2015-03-01","2015-03-01","2015-03-01","2015-03-01",
           "2015-04-01","2015-04-01","2015-04-01","2015-04-01",
           "2015-05-01","2015-05-01","2015-06-01","2015-06-01",
           "2015-06-01","2015-07-01","2015-07-01","2015-08-01",
           "2015-08-01","2015-09-01","2015-09-01","2015-09-01",
           "2015-10-01","2015-10-01","2015-11-01","2015-11-01",
           "2015-12-01","2016-01-01","2016-01-01","2016-01-01",
           "2016-01-01","2016-02-01","2016-02-01","2016-02-01",
           "2016-02-01","2016-03-01","2016-04-01","2016-04-01",
           "2016-04-01","2016-04-01","2016-05-01","2016-05-01",
           "2016-06-01","2016-06-01","2016-06-01","2016-06-01",
           "2016-07-01","2016-07-01","2016-07-01","2016-07-01",
           "2016-08-01","2016-08-01","2016-08-01","2016-08-01",
           "2016-08-01","2016-08-01","2016-08-01","2016-08-01",
           "2016-08-01","2016-08-01","2016-09-01","2016-09-01",
           "2016-09-01","2016-09-01","2016-10-01","2016-10-01",
           "2016-10-01","2016-11-01","2016-11-01")
sample <- as.data.frame(sample)
sample$Value <- (1:355)
colnames(sample)[1] <- c("Date")
View(sample)

在阅读了这一点后,我得出的结论是,我需要做的是一个while循环,它贯穿日期列,如果它是重复的,则每个值增加一天。使用lubridate package我做了类似的事情:

library(lubridate)    
while(sample$Date==sample$Date[-1]) {sample$Date <- sample$Date+days(1); print(sample$Date);}

但是,循环不会运行并产生大量警告。你知道如何解决这个问题吗?我认为这是一个非常简单的问题,我不熟悉循环。

谢谢!

1 个答案:

答案 0 :(得分:2)

我们可以使用data.table实现这一目标。首先,我们将进行设置,包括转换factor类的日期:

library( data.table )
setDT( sample )
sample[ , Date := as.Date( Date ) ]

然后我们会执行您的转换:

sample[ , Date := Date + ( seq_len( .N ) - 1L ), by = Date ]

我们在这里做的是分离匹配日期值的每个子集,并向它们添加序列向量。例如,具有4个匹配日期值的子集将向该日期向量添加c(0,1,2,3)天,以使第一个值保持不变,并且后续值以您描述的方式递增。