所以我有一个包含3列的表A(ID),B(时间戳)和C(二进制)。我想知道ID的持续时间在0到1之间(我不认为从1到0)
[+] Found tfvars file ./profiles/eu-sprint/eu-sprint.tfvars
Error: Cycle: data.template_file.wadcfg, data.template_file.settings, azurerm_virtual_machine_scale_set.sf_scale_set
我想要下表:
A B C
x t1 0 #(t1=1528362158)
y t2 1 #(t2=1534675468)
x t3 1 #(t3=1534675492)
x t4 0 #(t4=1534675748)
y t5 0 #(t5=1534675939)
y t6 1 #(t6=1534676003)
x t7 1 #(t7=1534676067)
答案 0 :(得分:0)
您可以使用以下内容。
但是,您需要找到处理NA
的方法。我在这里用0填充了它们。
library(tidyr)
df %>%
group_by(A) %>%
tidyr::spread(B,C) %>%
mutate_at(vars(contains("t")),funs(ifelse(is.na(.),0,.))) %>%
mutate(Duration=ifelse(A=="x",(t3-t1)+(t7-t4),t6-t5)) %>%
rename(ID=A) %>%
select(ID,Duration) %>%
ungroup()
结果:
# A tibble: 2 x 2
ID Duration
<chr> <dbl>
1 x 2
2 y 1
答案 1 :(得分:0)
这是您要寻找的吗?
library(tidyverse)
df <-
tibble(
ID = c(1, 2, 1, 1, 2, 2, 1),
Timestamp = c(1528362158, 1534675468, 1534675492, 1534675748, 1534675939, 1534676003, 1534676067),
Binary = c(0, 1, 1, 0, 0, 1, 1)
)
df %>%
group_by(ID) %>%
mutate(rn = row_number()) %>%
spread(Binary, Timestamp) %>%
fill(`0`, .direction = 'down') %>%
drop_na() %>%
mutate(Duration = `1` - `0`) %>%
summarise(Duration = sum(Duration))
结果:
ID Duration
<dbl> <dbl>
1 1 6313653
2 2 64