计算一个值减去该组中第一个值的差

时间:2020-02-25 07:37:06

标签: r

我有一个数据框df,其中包含按城市,性别,年龄和年龄分组的人口数据:

df <- data.frame(City=c("New York", "New York", "New York", "New York", "New York", 
             "Boston","Boston", "Boston", "Boston"),
             Gender=c("m","m","m", "f","f","m","m","f","f"),
             Year=c("2020","2021", "2022", "2020", "2021","2020","2021", "2020", "2021"),
             Age=c("1","1","1", "2","2","1","1","2","2"),
             Population=c("100", "105","110", "105", "110", "200","201", "220", "222"))

我需要为每一行计算与该组的第一个值(即2020年)的差,以得出以下结果:

df2 <- data.frame(City=c("New York", "New York", "New York", "New York", "New York", "Boston","Boston", "Boston", "Boston"),
              Gender=c("m","m","m", "f","f","m","m","f","f"),
              Year=c("2020","2021", "2022", "2020", "2021","2020","2021", "2020", "2021"),
              Age=c("1","1","1", "2","2","1","1","2","2"),
              Population=c("100", "105","110", "105", "110", "200","201", "220", "222"),
              PopulationGrowth=c("0", "5","10", "0","5","0","1","0","2"))

谢谢!

2 个答案:

答案 0 :(得分:4)

df %>%
  group_by(City, Gender) %>%
  arrange(Year, .by_group = T) %>%
  mutate(Population = as.numeric(as.character(Population)),
         PopulationGrowth = Population - first(Population))

# # A tibble: 9 x 6
# # Groups:   City, Gender [4]
#   City     Gender Year  Age   Population PopulationGrowth
#   <fct>    <fct>  <fct> <fct>      <dbl>            <dbl>
# 1 Boston   f      2020  2            220                0
# 2 Boston   f      2021  2            222                2
# 3 Boston   m      2020  1            200                0
# 4 Boston   m      2021  1            201                1
# 5 New York f      2020  2            105                0
# 6 New York f      2021  2            110                5
# 7 New York m      2020  1            100                0
# 8 New York m      2021  1            105                5
# 9 New York m      2022  1            110               10

arrange更改行的顺序。如果要保留原始顺序,请尝试以下操作:

df %>%
  group_by(City, Gender) %>%
  mutate(Population = as.numeric(as.character(Population)),
         PopulationGrowth = Population - first(Population, order_by = order(Year)))

# # A tibble: 9 x 6
# # Groups:   City, Gender [4]
#   City     Gender Year  Age   Population PopulationGrowth
#   <fct>    <fct>  <fct> <fct>      <dbl>            <dbl>
# 1 New York m      2020  1            100                0
# 2 New York m      2021  1            105                5
# 3 New York m      2022  1            110               10
# 4 New York f      2020  2            105                0
# 5 New York f      2021  2            110                5
# 6 Boston   m      2020  1            200                0
# 7 Boston   m      2021  1            201                1
# 8 Boston   f      2020  2            220                0
# 9 Boston   f      2021  2            222                2

答案 1 :(得分:1)

这是使用ave

的基本R解决方案
df2 <- within(df,PopulationGrowth <- ave(as.numeric(as.character(Population)),City,Gender, FUN = function(v) v-head(v,1)))

这样

> df2
      City Gender Year Age Population PopulationGrowth
1 New York      m 2020   1        100                0
2 New York      m 2021   1        105                5
3 New York      m 2022   1        110               10
4 New York      f 2020   2        105                0
5 New York      f 2021   2        110                5
6   Boston      m 2020   1        200                0
7   Boston      m 2021   1        201                1
8   Boston      f 2020   2        220                0
9   Boston      f 2021   2        222                2