employee <- c("John", "Adi", "Sam")
salary <- c(21000, 22000, 23000)
startdate <- as.Date(c("2014-11-01","2014-01-01","2014-10-01"))
enddate <- as.Date(c("2015-10-31","2014-12-31","2015-10-31"))
N<- c(2,1,2)
df<- data.frame(employee,salary, startdate, enddate, N)
我希望在“<”> 列中指定“n”的次数“n”重复整行,但我想更改 enddate 中的 enddate 原始行是固定日期,例如“31/12/2014”,并在重复行中将此固定日期设为 startdate 。运行代码以在 df2 :
中查看结果示例(预期输出)employee <- c(rep("John",2), "Adi", rep("Sam",2))
salary <- c(21000,21000, 22000, 23000,23000)
startdate <- as.Date(c("2014-11-01","2014-12-31", "2014-01-01","2014-10-01","2014-12-31"))
enddate <- as.Date(c("2014-12-31","2015-10-31","2014-12-31","2014-12-31","2015-10-31"))
N<- c(2,2,1,2,2)
df2<- data.frame(employee,salary, startdate, enddate, N)
答案 0 :(得分:0)
我们可以使用data.table
执行此操作。我们转换了&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df)
),通过复制&#39; N&#39;来扩展行。变量。我们得到了由&#39; employee&#39;分组的观察(.I[1L]
)的数字索引(&#39; i1&#39;),用它来分配(:=
)&# 39;结束日期&#39;与&#39; 2014-12-31&#39;。同样,我们为每个&#39;员工提供了倒数第二个元素(.I[seq_len(.N)>1L]
)的行索引(&#39; i2&#39;)。并设置&#39; startdate&#39; as&#39; 2014-12-31&#39;。
DT <- setDT(df)[rep(seq_len(.N), N)]
i1 <- DT[, .I[1L] , by = employee]$V1
DT[i1, enddate:= as.Date('2014-12-31')]
i2 <- DT[, .I[seq_len(.N)>1L], employee]$V1
DT[i2, startdate:= as.Date('2014-12-31')]
identical(as.data.table(df2), DT)
#[1] TRUE
或者我们可以使用if
执行此操作,并将“2014-12-31”连接起来。对于'startdate&#39;并且&#39; enddate&#39;,由&#39; employee&#39;分组然后将输出分配回列'startdate&#39;,&#39; enddate&#39;。
DT[, c('startdate', 'enddate') := if(.N>1L)
list(c(startdate[1L], as.Date('2014-12-31')),
c(as.Date('2014-12-31'), enddate[.N])) , by = employee]
identical(DT, as.data.table(df2))
#[1] TRUE