我希望能对以下命令使用提示: 我想在“名称”列和每年的“年份”列中计算城市的人口估计。 “增长”列提供增长率。因此,公式如下:
Population[Lucknow,2030] = Population[Lucknow, 2020] * growth[2030]
,依此类推。跟随df:
df <- data.frame(YEAR=c(2020,2020,2020,2030,2040,2050), NAME=c("Lucknow","Delhi","Hyderadabad",NA,NA,NA), POPULATION=c(3704, 29274,10275,NA,NA,NA), growth=c(1.0,1.0,1.0,1.10,1.18,1.24))
Year Name Population growth
2020 Lucknow 3704 1.0000000
2020 Delhi 29274 1.0000000
2020 Hyderabad 10275 1.0000000
2030 <NA> NA <NA> 1.10
2040 <NA> NA <NA> 1.18
2050 <NA> NA <NA> 1.24
编辑:下面是Dom(谢谢!)写的内容,输入是:
df <- tibble( year = rep(c(2020,2030,2040,2050), each = 3), city =rep(c("Lucknow","Delhi","Hyderadabad"), times = 4), pop = c(3704, 29274,10275, rep(NA_integer_, times = 9)), growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) )
year city pop growth
<dbl> <chr> <dbl> <dbl>
1 2020 Lucknow 3704 1
2 2020 Delhi 29274 1
3 2020 Hyderadabad 10275 1
4 2030 Lucknow NA 1.1
5 2030 Delhi NA 1.1
6 2030 Hyderadabad NA 1.1
7 2040 Lucknow NA 1.18
8 2040 Delhi NA 1.18
9 2040 Hyderadabad NA 1.18
10 2050 Lucknow NA 1.24
11 2050 Delhi NA 1.24
12 2050 Hyderadabad NA 1.24
输出应如下所示:
Year Name Population growth
2020 Lucknow 3704 1.0000000
2020 Delhi 29274 1.0000000
2020 Hyderabad 10275 1.0000000
2030 Lucknow 4074.4 1.1000000
2030 Delhi 32201.4 1.1000000
2030 Hyderabad 11302.5 1.1000000
....
如何在小标题中填充NA?
我对merge和dplyr :: mutate进行了各种尝试,但是由于这是向量操作,因此无法确定我在这里需要做什么。我很乐意引导正确的命令来执行这样的基本操作。
谢谢!
答案 0 :(得分:2)
使用dplyr
:
library(dplyr)
df %>%
arrange(city, year) %>%
group_by(city) %>%
mutate(pop = pop[1] * growth)
# A tibble: 12 x 4
# Groups: city [3]
year city pop growth
<dbl> <chr> <dbl> <dbl>
1 2020 Delhi 29274 1
2 2030 Delhi 32201. 1.1
3 2040 Delhi 34543. 1.18
4 2050 Delhi 36300. 1.24
5 2020 Hyderadabad 10275 1
6 2030 Hyderadabad 11303. 1.1
7 2040 Hyderadabad 12124. 1.18
8 2050 Hyderadabad 12741 1.24
9 2020 Lucknow 3704 1
10 2030 Lucknow 4074. 1.1
11 2040 Lucknow 4371. 1.18
12 2050 Lucknow 4593. 1.24
使用基础R
:
df <- df[order(df[["city"]], df[["year"]]), ]
df[["pop"]] <-
unlist(
lapply(
unique(df[["city"]]),
function(x) with(df[df[["city"]] == x, ], pop[1] * growth)
)
)
使用data.table
:
library(data.table)
setDT(df)[order(city, year), pop := pop[1] * growth, city]
数据:
df <- tibble(
year = rep(c(2020, 2030, 2040, 2050), each = 3),
city = rep(c("Lucknow", "Delhi", "Hyderadabad"), times = 4),
pop = c(3704, 29274, 10275, rep(NA, times = 9)),
growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3)
)
答案 1 :(得分:1)
基准年是否始终是2020年?如果是,则可以进行以下操作:
library(tidyverse)
df <- tibble( year = rep(c(2020, 2030, 2040, 2050), each = 3),
city = rep(c("Lucknow", "Delhi", "Hyderadabad"), times = 4),
pop = c(3704, 29274, 10275, rep(NA_integer_, times = 9)),
growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) )
uniq <- unique(df$pop)
uniq <- uniq[!is.na(uniq)]
df$pop <- rep(uniq, length(unique(df$year)))
df <- df %>%
mutate(pop2 = pop * growth)
答案 2 :(得分:0)
library(tidyverse)
NAME <- c("Lucknow","Delhi","Hyderadabad")
YEAR <- seq(2020,2050,10)
POPULATION=rep(c(3704, 29274,10275),4)
pop_df <- bind_cols(expand.grid(Name=NAME,Year=YEAR),Population=POPULATION)
growth_df <- data.frame(Year=seq(2020,2050,10),growth=c(1,1.1,1.18,1.23))
pop_df <- left_join(pop_df,growth_df) %>%
mutate(Population=round(Population*growth))