下面是我在R(事件日志)中的表格
Case.ID | Activity | Timestamp | Resource | State
------------------------------------------------------------
0 |Take order| 00:12:04 | Waiter | Assign
------------------------------------------------------------
0 |Take order| 00:18:02 | | Complete
------------------------------------------------------------
1 |Bring food| 00:47:23 | Cook helper | Assign
------------------------------------------------------------
1 |Bring food| 00:52:41 | | Complete
------------------------------------------------------------
1 |Bring food| 00:54:52 | Cook helper | Assign
------------------------------------------------------------
1 |Bring food| 00:59:11 | | Complete
列Resource中有值的行是活动的开始,活动结束在该单元格中没有值。
我试过了,
assign <- subset(csv, select=c(Case.ID,Activity,Timestamp,State), State=="assign")
complete <- subset(csv, select=c(Case.ID,Activity,Timestamp,State), State=="complete")
merge <- merge(assign, complete, by=c("Case.ID", "Activity"))
但它显示了一些错误,如下所示。
应该删除第二行和第四行,但不确定是否可以这样做。
Case.ID | Activity | Start.Timestamp | End.Timestamp |
------------------------------------------------------
1 |Bring food| 00:47:23 | 00:52:41 |
------------------------------------------------------
1 |Bring food| 00:47:23 | 00:59:11 |
------------------------------------------------------
1 |Bring food| 00:54:52 | 00:52:41 |
------------------------------------------------------
1 |Bring food| 00:54:52 | 00:59:11 |
答案 0 :(得分:0)
创建一个额外的分组变量,然后重塑为宽格式,如下所示:
df$grp <- cumsum(df$Resource!='')
library(reshape2)
df2 <- dcast(df, Case.ID + Activity + grp ~ State, value.var = 'Timestamp')[,-3]
给出:
> df2
Case.ID Activity assign complete
1 0 Take order 00:12:04 00:18:02
2 1 Bring food 00:47:23 00:52:41
3 1 Bring food 00:54:52 00:59:11
data.table
的替代方案:
library(data.table)
df2 <- dcast(setDT(df)[, grp := cumsum(Resource!='')],
Case.ID + Activity + grp ~ State, value.var = 'Timestamp')[, grp := NULL][]