Question

想象以下数据框的片段：

       ID        ActivityName     Time         Type    Shape 
1       1             Request    0.000       Type_1      767           
2       1             Request  600.000       Type_1      767           
3       1               Start  600.000       Type_1     1376           
4       1               Start  600.000       Type_1     1376           
5       1 Schedule Activities  600.000       Type_1       15           
6       1 Schedule Activities 2062.295       Type_1       15

我要做的是根据ActivityName中的重复条目创建两个新列。

具体来说，我想将同一活动的两个后续行合并为一个开头为完整的时间戳记（从Time起，以秒为单位。）

鉴于ActivityName中的并非所有条目具有匹配的第二个条目（但是，最多两个连续的条目相同），我也想删除这样的“单身”行。

P.s。尽管在数据框摘要中没有看到，但ActivityName的所有级别都重复出现，无论是连续的还是相同的。

任何有关实现此目标的想法将受到高度赞赏。

Answer 1

假设ID是一个变量，该变量指示应该对ActivityName中的哪些条目进行分组，那么这应该起作用：

library(tidyverse)

df %>%
  #Group by ID and ActivityName
  group_by(ID, ActivityName) %>%
  #Stay only with entries with more than 1 row
  filter(n() > 1) %>%
  #Put the min value of Time as Start and the max value as Timestamp 
  summarize (Start = min(Time),
             Timestamp = max(Time))

筛选列中两个相同的连续条目

1 个答案: