R组合日期列

时间:2013-09-06 16:14:22

标签: r date

我有一个data.frame,如下所示。

toolid          startdate       enddate         stage
abc                 1-Jan-13    5-Jan-13    production
abc                 6-Jan-13    10-Jan-13   down
xyz                 3-Jan-13    8-Jan-13    production
xyz                 9-Jan-13    15-Jan-13   down

我想将data.frame转换为下面的格式。我正在尝试将上面data.frame中的列'startdate''enddate'合并到下面名为'date'的单个列中。我拥有的原始数据在许多toolids和许多阶段都有几千行。我已经找到了一种使用SQL的方法,但更喜欢R解决方案。我已经开始融化数据,如下面的代码所示。

toolid  date            stage
abc     1-Jan-13    production
abc     2-Jan-13    production
abc     3-Jan-13    production
abc     4-Jan-13    production
abc     5-Jan-13    production
abc     6-Jan-13    down
abc     7-Jan-13    down
abc     8-Jan-13    down
abc     9-Jan-13    down
abc     10-Jan-13   down
xyz     3-Jan-13    production
xyz     4-Jan-13    production
xyz     5-Jan-13    production
xyz     6-Jan-13    production
xyz     7-Jan-13    production
xyz     8-Jan-13    production
xyz     9-Jan-13    down
xyz     10-Jan-13   down
xyz     11-Jan-13   down
xyz     12-Jan-13   down
xyz     13-Jan-13   down
xyz     14-Jan-13   down
xyz     15-Jan-13   down

R代码

startdate=c('1-Jan-13','6-Jan-13','3-Jan-13','9-Jan-13')
enddate=c('5-Jan-13',    '10-Jan-13',   '8-Jan-13', '15-Jan-13')
toolid=c('abc',     'abc',  'xyz',  'xyz')
stage=c('production',    'down',    'production',   'down')
data=data.frame(toolid,startdate,enddate,stage)
require(reshape2)
newdata=melt(data,id.vars=c('toolid','stage'))

更新:来自@ Ananda Mahto的应对代码回答以及添加几行代码以提供数据透视表类型的输出

## Convert "startdate" and "enddate" to date objects
data$startdate <- as.Date(data$startdate, format="%d-%b-%y")
data$enddate <- as.Date(data$enddate, format="%d-%b-%y")


## Use `seq` to create the date sequence, and manually recreate
##   your dataframe. `do.call(rbind, ...) to put it back together
ddd=do.call(rbind, lapply(sequence(nrow(data)), function(x) {
  data.frame(toolid = data$toolid[x], 
             date = seq(data$startdate[x], data$enddate[x], by = 1),
             stage = data$stage[x])
}))

ddd


   toolid       date      stage
1     abc 2013-01-01 production
2     abc 2013-01-02 production
3     abc 2013-01-03 production
4     abc 2013-01-04 production
5     abc 2013-01-05 production
6     abc 2013-01-06       down
7     abc 2013-01-07       down
8     abc 2013-01-08       down
9     abc 2013-01-09       down
10    abc 2013-01-10       down
11    xyz 2013-01-03 production
12    xyz 2013-01-04 production
13    xyz 2013-01-05 production
14    xyz 2013-01-06 production
15    xyz 2013-01-07 production
16    xyz 2013-01-08 production
17    xyz 2013-01-09       down
18    xyz 2013-01-10       down
19    xyz 2013-01-11       down
20    xyz 2013-01-12       down
21    xyz 2013-01-13       down
22    xyz 2013-01-14       down
23    xyz 2013-01-15       down

ddd1=dcast(ddd,date~stage)


ddd1
         date down production
1  2013-01-01    0          1
2  2013-01-02    0          1
3  2013-01-03    0          2
4  2013-01-04    0          2
5  2013-01-05    0          2
6  2013-01-06    1          1
7  2013-01-07    1          1
8  2013-01-08    1          1
9  2013-01-09    2          0
10 2013-01-10    2          0
11 2013-01-11    1          0
12 2013-01-12    1          0
13 2013-01-13    1          0
14 2013-01-14    1          0
15 2013-01-15    1          0

1 个答案:

答案 0 :(得分:4)

我确信有更多“正确”的方法可以做到这一点,但这很快就会出现在我的脑海中。

首先,将“startdate”和“enddate”转换为日期对象

data$startdate <- as.Date(data$startdate, format="%d-%b-%y")
data$enddate <- as.Date(data$enddate, format="%d-%b-%y")

然后,使用seq创建日期序列,并手动重新创建data.frame。使用`do.call(rbind,...)将它重新组合在一起。

ddd <- do.call(rbind, lapply(sequence(nrow(data)), function(x) {
  data.frame(toolid = data$toolid[x], 
             date = seq(data$startdate[x], data$enddate[x], by = 1),
             stage = data$stage[x])
}))
ddd
#    toolid       date      stage
# 1     abc 2013-01-01 production
# 2     abc 2013-01-02 production
# 3     abc 2013-01-03 production
# 4     abc 2013-01-04 production
# 5     abc 2013-01-05 production
# 6     abc 2013-01-06       down
# 7     abc 2013-01-07       down
# 8     abc 2013-01-08       down
# 9     abc 2013-01-09       down
# 10    abc 2013-01-10       down
# 11    xyz 2013-01-03 production
# 12    xyz 2013-01-04 production
# 13    xyz 2013-01-05 production
# 14    xyz 2013-01-06 production
# 15    xyz 2013-01-07 production
# 16    xyz 2013-01-08 production
# 17    xyz 2013-01-09       down
# 18    xyz 2013-01-10       down
# 19    xyz 2013-01-11       down
# 20    xyz 2013-01-12       down
# 21    xyz 2013-01-13       down
# 22    xyz 2013-01-14       down
# 23    xyz 2013-01-15       down

最后,看看你想说的最终结果,你可以一直坚持使用基础R并使用table。我把它放在as.data.frame.matrix()中,因为我假设你想要data.frame作为结果:

as.data.frame.matrix(table(ddd[-1]))
#            down production
# 2013-01-01    0          1
# 2013-01-02    0          1
# 2013-01-03    0          2
# 2013-01-04    0          2
# 2013-01-05    0          2
# 2013-01-06    1          1
# 2013-01-07    1          1
# 2013-01-08    1          1
# 2013-01-09    2          0
# 2013-01-10    2          0
# 2013-01-11    1          0
# 2013-01-12    1          0
# 2013-01-13    1          0
# 2013-01-14    1          0
# 2013-01-15    1          0