此问题与以下链接中我的上一篇帖子紧密相关: Expand periods to regularly occuring timestamps
从本质上讲,这是该问题的相反步骤。
我现在有一个按时间间隔(1分钟周期)规则排列的数据集,我需要折叠这些周期,以便每一行代表一个时间周期,其中该类保持不变,如下所示:
样本输入数据框为:
df_in <- tibble(st =seq(ymd_hms("2016-01-01 00:35:00"),
ymd_hms("2016-01-01 00:58:00")-60,60),
en = st+59)
df_in$cl <- c("a",rep("c",3),rep("a",6),rep("c",9),rep("a",1),"c",rep("b",2))
我设法通过如下代码所示的循环来完成此任务,但这效率极低且速度慢(我的数据源在数百万行中)。我敢肯定有一种通过dplyr进行认证的方法,我希望有人能指出我正确的方向:
df_in$flag <- 1
df_in %>%
mutate(flag = ifelse(lag(cl)==cl,0,1)) -> df_in
df_in$flag[1] <- 1
df_in$flag2 <- 0
df_in$flag2[1] <- 1
for (i in 2:nrow(df_in)) {
if (df_in$flag[i] == 0) {
df_in$flag2[i] = df_in$flag2[i-1]
} else {
df_in$flag2[i] = df_in$flag2[i-1] + 1
}
}
df_in %>%
group_by(flag2) %>%
summarise(st = min(st),
en = max(en),
cl = unique(cl)) %>%
View()
再次感谢...
答案 0 :(得分:1)
这是使用st
min(st)
我们将en
和max(en)
设置为cl
等于idx = rleid(cl)
和rleid
等于dplyr
。 library(dplyr)
df_in %>%
mutate(idx = data.table::rleid(cl)) %>%
group_by(cl, idx) %>%
summarise(st = min(st),
en = max(en)) %>%
arrange(idx) %>%
select(-idx)
创建一个“游程类型ID列”。
有了Exception in thread "main" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:121)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:89)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:110)
at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:354)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:110)
... 3 more
Caused by: org.janusgraph.core.JanusGraphException: StorageBackend version is incompatible with current JanusGraph version: storage [0.2.1] vs. runtime [0.2.0]
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1427)
at org.janusgraph.core.JanusGraphFactory.lambda$open$0(JanusGraphFactory.java:152)
at org.janusgraph.graphdb.management.JanusGraphManager.openGraph(JanusGraphManager.java:210)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:151)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:101)
at org.janusgraph.graphdb.management.JanusGraphManager.lambda$new$0(JanusGraphManager.java:65)
at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)
at org.janusgraph.graphdb.management.JanusGraphManager.<init>(JanusGraphManager.java:64)
... 8 more
Exception in thread "gremlin-server-shutdown" java.lang.NullPointerException
at org.apache.tinkerpop.gremlin.server.GremlinServer.stop(GremlinServer.java:264)
at org.apache.tinkerpop.gremlin.server.GremlinServer.lambda$new$0(GremlinServer.java:91)
at java.lang.Thread.run(Thread.java:748)
,您可以做到
{{1}}