获取高数据集中的分组值之间的差异

时间:2019-09-17 18:42:33

标签: r dataframe diff

我的数据设置如下例所示:

Name     df Value
A         1   .5
A         2    2
A         3    3
B         1    1
B         2    .5

我想获得两个值之间的差异,直到“名称”列更改,然后我希望它停止并开始获得新的差异。如下所示:

Name     df Value   Diff
A         1   .5      NA
A         2    2      1.5
A         3    3      2.5
B         1    1       NA
B         2    .5     -.5

有什么办法可以做到这一点?我曾尝试将数据集格式化为宽格式,但我也找不到一种使该数据集有效的方法。

2 个答案:

答案 0 :(得分:3)

一种选择是按diff分组

library(dplyr)
df1 %>%
   group_by(Name) %>%
    mutate(Diff = c(NA, cumsum(diff(Value))))
# A tibble: 5 x 4
# Groups:   Name [2]
#  Name     df Value  Diff
#  <chr> <int> <dbl> <dbl>
#1 A         1   0.5  NA  
#2 A         2   2     1.5
#3 A         3   3     2.5
#4 B         1   1    NA  
#5 B         2   0.5  -0.5

数据

df1 <- structure(list(Name = c("A", "A", "A", "B", "B"), df = c(1L, 
2L, 3L, 1L, 2L), Value = c(0.5, 2, 3, 1, 0.5)), 
   class = "data.frame", row.names = c(NA, 
-5L))

答案 1 :(得分:2)

@akrun 答案是可行的方法,但就像一个谜一样,这也可行:

df1 %>% 
  group_by(Name) %>% 
  mutate(Diff = cumsum(Value - lag(Value, default = Value[1])))
# # A tibble: 5 x 4
# # Groups:   Name [2]
#   Name     df Value  Diff
#   <chr> <int> <dbl> <dbl>
# 1 A         1   0.5   0  
# 2 A         2   2     1.5
# 3 A         3   3     2.5
# 4 B         1   1     0  
# 5 B         2   0.5  -0.5