根据R中的两列计算差异

时间:2019-01-04 00:12:53

标签: r difference

我有一个棘手的问题。这是我的数据:

> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA,  NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,  3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))

   seconds threat bins
1   689.00     NA    1
2   689.25     NA    1
3   689.50     NA    1
4   689.75     NA    1
5   690.00     NA    1
6   690.25     NA    2
7   690.50      1    2
8   690.75      1    2
9   691.00      0    2
10  691.25      0    2
11  691.50      1    3
12  691.75     NA    3
13  692.00     NA    3
14  692.25      1    3
15  692.50      1    3

在每个垃圾箱中,我试图计算它们在威胁列中处于每种“威胁”类型的时间。因此,每当威胁在每个容器中发生不同的情况时,我就需要计算差异分数。因此,这是我希望实现的示例:

  bin threat seconds
   1     NA    1.25
   1      1    0.00
   1      0    0.00
   2     NA    0.25
   2      1    0.50
   2      0    0.50
   3     NA    0.50
   3      1    0.75
   3      0    0.00

1 个答案:

答案 0 :(得分:4)

这是一个SELECT customers.Name FROM (SELECT * FROM customers AS c INNER JOIN orders AS o ON c.Id=o.CustomerId) AS co WHERE customers.Name NOT IN co; #Syntax error: co 解决方案:

tidyverse

如果不需要添加df %>% arrange(seconds) %>% mutate(duration = lead(seconds) - seconds) %>% complete(bins, threat, fill = list(duration = 0)) %>% group_by(bins, threat) %>% summarize(seconds = sum(duration, na.rm = TRUE)) # A tibble: 9 x 3 # Groups: bins [?] # bins threat seconds # <int> <int> <dbl> # 1 1 0 0 # 2 1 1 0 # 3 1 NA 1.25 # 4 2 0 0.5 # 5 2 1 0.5 # 6 2 NA 0.25 # 7 3 0 0 # 8 3 1 0.5 # 9 3 NA 0.5 为0的行,则可以擦除complete(bins, threat, fill = list(duration = 0))

因此,首先我们seconds确保数据安全。然后,由于arrange之间的相互作用,我们定义了一个新变量threat。接下来,我们为尚未出现的(durationduration == 0)情况添加带有bins的新行。最后,我们将threatbins分组,然后对持续时间进行汇总。