我想找出所观察到的案件与不属于案件类型的案件之间的差异:
set.seed(42)
df <- data.frame(type = factor(rep(c("A", "B", "C"), 2)), observed = rep(c(T,F), 3),
val1 = sample(5:1, 6, replace = T), val2 = sample(1:5, 6, replace = T),
val3 = sample(letters[1:5], 6, replace = T))
# type observed val1 val2 val3
# 1 A TRUE 1 4 e
# 2 B FALSE 1 1 b
# 3 C TRUE 4 4 c
# 4 A FALSE 1 4 e
# 5 B TRUE 2 3 e
# 6 C FALSE 3 4 a
以下代码适用于只有两种不同类型的案例(例如levels(df$type) == c("A", "B")
,但不适用于上面提供的df
:
df %>%
group_by(type, observed) %>%
summarise_if(is.numeric, funs(diff(., 1)))
所需的输出是:
# type val1 val2
# A 0 0
# B -1 -2
# C -1 0
答案 0 :(得分:5)
这样做:
df %>%
group_by(type) %>%
arrange(type, desc(observed)) %>%
mutate_if(is.numeric,funs(. - lag(., default=0))) %>%
summarise_if(is.numeric, tail, 1)
# # A tibble: 3 x 3
# type val1 val2
# <fctr> <dbl> <dbl>
# 1 A -1 0
# 2 B -2 0
# 3 C 3 1
其中一个dplyr
向导可能会提出一种更优雅的方法。