如何在R中合并两行

时间:2017-05-10 13:22:54

标签: r csv event-log

下面是我在R(事件日志)中的表格

Case.ID | Activity | Timestamp |    Resource   |   State
------------------------------------------------------------
   0    |Take order| 00:12:04  |     Waiter    |   Assign
------------------------------------------------------------
   0    |Take order| 00:18:02  |               |  Complete
------------------------------------------------------------
   1    |Bring food| 00:47:23  |  Cook helper  |   Assign
------------------------------------------------------------
   1    |Bring food| 00:52:41  |               |  Complete
------------------------------------------------------------
   1    |Bring food| 00:54:52  | Cook helper   |   Assign
------------------------------------------------------------
   1    |Bring food| 00:59:11  |               |  Complete

列Resource中有值的行是活动的开始,活动结束在该单元格中没有值。

我试过了,

assign <- subset(csv, select=c(Case.ID,Activity,Timestamp,State), State=="assign")
complete <- subset(csv, select=c(Case.ID,Activity,Timestamp,State), State=="complete")
merge <- merge(assign, complete, by=c("Case.ID", "Activity"))

但它显示了一些错误,如下所示。

应该删除第二行和第四行,但不确定是否可以这样做。

Case.ID | Activity | Start.Timestamp | End.Timestamp |
------------------------------------------------------
   1    |Bring food|    00:47:23     |    00:52:41   |
------------------------------------------------------
   1    |Bring food|    00:47:23     |    00:59:11   | 
------------------------------------------------------
   1    |Bring food|    00:54:52     |    00:52:41   |
------------------------------------------------------
   1    |Bring food|    00:54:52     |    00:59:11   |

1 个答案:

答案 0 :(得分:0)

创建一个额外的分组变量,然后重塑为宽格式,如下所示:

df$grp <- cumsum(df$Resource!='')

library(reshape2)
df2 <- dcast(df, Case.ID + Activity + grp ~ State, value.var = 'Timestamp')[,-3]

给出:

> df2
  Case.ID   Activity   assign complete
1       0 Take order 00:12:04 00:18:02
2       1 Bring food 00:47:23 00:52:41
3       1 Bring food 00:54:52 00:59:11

data.table的替代方案:

library(data.table)
df2 <- dcast(setDT(df)[, grp := cumsum(Resource!='')], 
             Case.ID + Activity + grp ~ State, value.var = 'Timestamp')[, grp := NULL][]