根据课程保持时间顺序折叠常规时间段数据

时间:2018-07-30 09:45:23

标签: r dplyr timestamp vectorization

此问题与以下链接中我的上一篇帖子紧密相关: Expand periods to regularly occuring timestamps

从本质上讲,这是该问题的相反步骤。

我现在有一个按时间间隔(1分钟周期)规则排列的数据集,我需要折叠这些周期,以便每一行代表一个时间周期,其中该类保持不变,如下所示:

enter image description here

样本输入数据框为:

df_in <- tibble(st =seq(ymd_hms("2016-01-01 00:35:00"),
                         ymd_hms("2016-01-01 00:58:00")-60,60),
                 en = st+59)
df_in$cl <- c("a",rep("c",3),rep("a",6),rep("c",9),rep("a",1),"c",rep("b",2))

我设法通过如下代码所示的循环来完成此任务,但这效率极低且速度慢(我的数据源在数百万行中)。我敢肯定有一种通过dplyr进行认证的方法,我希望有人能指出我正确的方向:

df_in$flag <- 1
df_in %>% 
  mutate(flag = ifelse(lag(cl)==cl,0,1)) -> df_in

df_in$flag[1] <- 1
df_in$flag2 <- 0
df_in$flag2[1] <- 1

for (i in 2:nrow(df_in)) {
  if (df_in$flag[i] == 0) {
    df_in$flag2[i] = df_in$flag2[i-1]
  } else {
    df_in$flag2[i] = df_in$flag2[i-1] + 1
  }
}

df_in %>% 
  group_by(flag2) %>%
  summarise(st = min(st),
            en = max(en),
            cl = unique(cl)) %>% 
View()

再次感谢...

1 个答案:

答案 0 :(得分:1)

这是使用st

的一种选择
min(st)

我们将enmax(en)设置为cl等于idx = rleid(cl)rleid等于dplyrlibrary(dplyr) df_in %>% mutate(idx = data.table::rleid(cl)) %>% group_by(cl, idx) %>% summarise(st = min(st), en = max(en)) %>% arrange(idx) %>% select(-idx) 创建一个“游程类型ID列”。


有了Exception in thread "main" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:121) at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:89) at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:110) at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:354) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:110) ... 3 more Caused by: org.janusgraph.core.JanusGraphException: StorageBackend version is incompatible with current JanusGraph version: storage [0.2.1] vs. runtime [0.2.0] at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1427) at org.janusgraph.core.JanusGraphFactory.lambda$open$0(JanusGraphFactory.java:152) at org.janusgraph.graphdb.management.JanusGraphManager.openGraph(JanusGraphManager.java:210) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:151) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:101) at org.janusgraph.graphdb.management.JanusGraphManager.lambda$new$0(JanusGraphManager.java:65) at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684) at org.janusgraph.graphdb.management.JanusGraphManager.<init>(JanusGraphManager.java:64) ... 8 more Exception in thread "gremlin-server-shutdown" java.lang.NullPointerException at org.apache.tinkerpop.gremlin.server.GremlinServer.stop(GremlinServer.java:264) at org.apache.tinkerpop.gremlin.server.GremlinServer.lambda$new$0(GremlinServer.java:91) at java.lang.Thread.run(Thread.java:748) ,您可以做到

{{1}}