我有一个数据框df,其中包含按城市,性别,年龄和年龄分组的人口数据:
df <- data.frame(City=c("New York", "New York", "New York", "New York", "New York",
"Boston","Boston", "Boston", "Boston"),
Gender=c("m","m","m", "f","f","m","m","f","f"),
Year=c("2020","2021", "2022", "2020", "2021","2020","2021", "2020", "2021"),
Age=c("1","1","1", "2","2","1","1","2","2"),
Population=c("100", "105","110", "105", "110", "200","201", "220", "222"))
我需要为每一行计算与该组的第一个值(即2020年)的差,以得出以下结果:
df2 <- data.frame(City=c("New York", "New York", "New York", "New York", "New York", "Boston","Boston", "Boston", "Boston"),
Gender=c("m","m","m", "f","f","m","m","f","f"),
Year=c("2020","2021", "2022", "2020", "2021","2020","2021", "2020", "2021"),
Age=c("1","1","1", "2","2","1","1","2","2"),
Population=c("100", "105","110", "105", "110", "200","201", "220", "222"),
PopulationGrowth=c("0", "5","10", "0","5","0","1","0","2"))
谢谢!
答案 0 :(得分:4)
df %>%
group_by(City, Gender) %>%
arrange(Year, .by_group = T) %>%
mutate(Population = as.numeric(as.character(Population)),
PopulationGrowth = Population - first(Population))
# # A tibble: 9 x 6
# # Groups: City, Gender [4]
# City Gender Year Age Population PopulationGrowth
# <fct> <fct> <fct> <fct> <dbl> <dbl>
# 1 Boston f 2020 2 220 0
# 2 Boston f 2021 2 222 2
# 3 Boston m 2020 1 200 0
# 4 Boston m 2021 1 201 1
# 5 New York f 2020 2 105 0
# 6 New York f 2021 2 110 5
# 7 New York m 2020 1 100 0
# 8 New York m 2021 1 105 5
# 9 New York m 2022 1 110 10
arrange
更改行的顺序。如果要保留原始顺序,请尝试以下操作:
df %>%
group_by(City, Gender) %>%
mutate(Population = as.numeric(as.character(Population)),
PopulationGrowth = Population - first(Population, order_by = order(Year)))
# # A tibble: 9 x 6
# # Groups: City, Gender [4]
# City Gender Year Age Population PopulationGrowth
# <fct> <fct> <fct> <fct> <dbl> <dbl>
# 1 New York m 2020 1 100 0
# 2 New York m 2021 1 105 5
# 3 New York m 2022 1 110 10
# 4 New York f 2020 2 105 0
# 5 New York f 2021 2 110 5
# 6 Boston m 2020 1 200 0
# 7 Boston m 2021 1 201 1
# 8 Boston f 2020 2 220 0
# 9 Boston f 2021 2 222 2
答案 1 :(得分:1)
这是使用ave
df2 <- within(df,PopulationGrowth <- ave(as.numeric(as.character(Population)),City,Gender, FUN = function(v) v-head(v,1)))
这样
> df2
City Gender Year Age Population PopulationGrowth
1 New York m 2020 1 100 0
2 New York m 2021 1 105 5
3 New York m 2022 1 110 10
4 New York f 2020 2 105 0
5 New York f 2021 2 110 5
6 Boston m 2020 1 200 0
7 Boston m 2021 1 201 1
8 Boston f 2020 2 220 0
9 Boston f 2021 2 222 2