如何按组生成日期序列

时间:2016-01-21 11:58:54

标签: r data.table zoo

我们说我们有以下数据。表

set.seed(7)
library(data.table)
library(zoo)
dt <- data.table(ID=c('a','a','a','b','b'), Tag=c(1,2,3,1,2), Begin=c('2015-01-01', '2014-05-07', '2014-08-02', '2015-02-03','2013-08-09'), x=rnorm(5), y = rnorm(5), z = rnorm(5))
dt[,Begin:=as.Date(Begin, '%Y-%m-%d')]

返回,

   ID Tag      Begin          x          y         z
1:  a   1 2015-01-01  2.2872472 -0.9472799 0.3569862
2:  a   2 2014-05-07 -1.1967717  0.7481393 2.7167518
3:  a   3 2014-08-02 -0.6942925 -0.1169552 2.2814519
4:  b   1 2015-02-03 -0.4122930  0.1526576 0.3240205
5:  b   2 2013-08-09 -0.9706733  2.1899781 1.8960671

我将Begin列作为日期,并希望将Begin扩展到接下来的2个月。我应用了以下代码:

dt[, Date := seq(from = Begin, to = Begin+months(2), by = '1 months'), by = .(ID, Tag)]

但我有以下错误:

Warning messages:
1: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin,  :
  RHS 1 is length 3 (greater than the size (1) of group 1). The last 2 element(s) will be discarded.
2: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin,  :
  RHS 1 is length 3 (greater than the size (1) of group 2). The last 2 element(s) will be discarded.
3: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin,  :
  RHS 1 is length 3 (greater than the size (1) of group 3). The last 2 element(s) will be discarded.
4: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin,  :
  RHS 1 is length 3 (greater than the size (1) of group 4). The last 2 element(s) will be discarded.
5: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin,  :
  RHS 1 is length 3 (greater than the size (1) of group 5). The last 2 element(s) will be discarded.

我期待的结果是

ID Tag       Date          x          y         z
 1:  a   1 2015-01-01  2.2872472 -0.9472799 0.3569862
 2:  a   1 2015-02-01  2.2872472 -0.9472799 0.3569862
 3:  a   1 2015-03-01  2.2872472 -0.9472799 0.3569862
 4:  a   2 2014-05-07 -1.1967717  0.7481393 2.7167518
 5:  a   2 2014-06-07 -1.1967717  0.7481393 2.7167518
 6:  a   2 2014-07-07 -1.1967717  0.7481393 2.7167518
 7:  a   3 2014-08-02 -0.6942925 -0.1169552 2.2814519
 8:  a   3 2014-09-02 -0.6942925 -0.1169552 2.2814519
 9:  a   3 2014-10-02 -0.6942925 -0.1169552 2.2814519
10:  b   1 2015-02-03 -0.4122930  0.1526576 0.3240205
11:  b   1 2015-03-03 -0.4122930  0.1526576 0.3240205
12:  b   1 2015-04-03 -0.4122930  0.1526576 0.3240205
13:  b   2 2013-08-09 -0.9706733  2.1899781 1.8960671
14:  b   2 2013-09-09 -0.9706733  2.1899781 1.8960671
15:  b   2 2013-10-09 -0.9706733  2.1899781 1.8960671

我猜错误的发生是因为我可能没有唯一的密钥。

请注意,我的示例数据中只有xyz,但在我的真实数据集中,我有超过10列。

你能给我一些建议吗?

1 个答案:

答案 0 :(得分:2)

我们按行序列进行分组,因为&#34; ID&#34;,&#34;&#34;标记&#34;基。

dt[, list(Date = seq(Begin, length.out=3, by = '1 month'), x,y,z), by = 1:nrow(dt)]

或者@David Arenburg提到,我们通过&#34; N&#34;复制行。然后按&#34; ID&#34;,&#34;标记&#34;只选择第一次观察&#34;开始&#34;

 dt[rep(1:.N, each = 3)][, Begin := seq(Begin[1L],
      length.out=3, by = '1 month'), by = .(ID, Tag)][]