我正在尝试使用以下数据帧上的group_by和lag计算前几行之间的差异
ID DATE Value
555 1/9/2018 10
555 2/9/2018 20
555 3/9/2018 50
555 4/9/2018 70
000 1/9/2018 0
000 2/9/2018 5
000 3/9/2018 15
111 1/9/2018 0
111 2/9/2018 15
111 3/9/2018 20
111 4/9/2018 25
差异应该显示如下:
ID DATE Value Diff
555 1/9/2018 10 0
555 2/9/2018 20 10
555 3/9/2018 50 30
555 4/9/2018 70 20
000 1/9/2018 0 0
000 2/9/2018 5 5
000 3/9/2018 15 10
111 1/9/2018 0 0
111 2/9/2018 15 15
111 3/9/2018 20 5
111 4/9/2018 25 5
使用此行代码
data %>%
group_by(ID) %>%
arrange(DATE) %>%
mutate(Diff= Value - lag(Value, default = first(Value)))
它跳过按ID的分组条件,并像这样计算所有行之间的差异:
ID DATE Value Diff
555 1/9/2018 10 0
555 2/9/2018 20 10
555 3/9/2018 50 30
555 4/9/2018 70 20
000 1/9/2018 0 -70
000 2/9/2018 5 5
000 3/9/2018 15 10
111 1/9/2018 0 -15
111 2/9/2018 15 15
111 3/9/2018 20 5
111 4/9/2018 25 5
答案 0 :(得分:0)
您的代码对我有用(稍作调整)。
> data_new
# A tibble: 11 x 4
# Groups: ID [3]
ID DATE Value Diff
<chr> <fct> <int> <int>
1 555 1/9/2018 10 0
2 555 2/9/2018 20 10
3 555 3/9/2018 50 30
4 555 4/9/2018 70 20
5 000 1/9/2018 0 0
6 000 2/9/2018 5 5
7 000 3/9/2018 15 10
8 111 1/9/2018 0 0
9 111 2/9/2018 15 15
10 111 3/9/2018 20 5
11 111 4/9/2018 25 5
data_new <- data %>%
group_by(ID) %>%
mutate(Diff = Value - lag(Value, default = first(Value)))
data <- structure(list(ID = c("555", "555", "555", "555", "000", "000",
"000", "111", "111", "111", "111"), DATE = structure(c(1L, 2L,
3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L), .Label = c("1/9/2018", "2/9/2018",
"3/9/2018", "4/9/2018"), class = "factor"), Value = c(10L, 20L,
50L, 70L, 0L, 5L, 15L, 0L, 15L, 20L, 25L)), row.names = c(NA,
-11L), class = "data.frame")