根据现有行和列计算r数据帧r中的新行

时间:2018-12-07 10:54:36

标签: r dataframe dplyr

我希望能对以下命令使用提示: 我想在“名称”列和每年的“年份”列中计算城市的人口估计。 “增长”列提供增长率。因此,公式如下:

Population[Lucknow,2030] = Population[Lucknow, 2020] * growth[2030]

,依此类推。跟随df:

df <- data.frame(YEAR=c(2020,2020,2020,2030,2040,2050), NAME=c("Lucknow","Delhi","Hyderadabad",NA,NA,NA), POPULATION=c(3704, 29274,10275,NA,NA,NA), growth=c(1.0,1.0,1.0,1.10,1.18,1.24))
Year                Name           Population        growth
2020             Lucknow                 3704     1.0000000
2020               Delhi                29274     1.0000000
2020           Hyderabad                10275     1.0000000
2030                <NA>                   NA   <NA> 1.10
2040                <NA>                   NA   <NA> 1.18
2050                <NA>                   NA   <NA> 1.24

编辑:下面是Dom(谢谢!)写的内容,输入是:

df <- tibble( year = rep(c(2020,2030,2040,2050), each = 3), city =rep(c("Lucknow","Delhi","Hyderadabad"), times = 4), pop = c(3704, 29274,10275, rep(NA_integer_, times = 9)), growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) )
    year city          pop growth
   <dbl> <chr>       <dbl>  <dbl>
 1  2020 Lucknow      3704   1   
 2  2020 Delhi       29274   1   
 3  2020 Hyderadabad 10275   1   
 4  2030 Lucknow        NA   1.1 
 5  2030 Delhi          NA   1.1 
 6  2030 Hyderadabad    NA   1.1 
 7  2040 Lucknow        NA   1.18
 8  2040 Delhi          NA   1.18
 9  2040 Hyderadabad    NA   1.18
10  2050 Lucknow        NA   1.24
11  2050 Delhi          NA   1.24
12  2050 Hyderadabad    NA   1.24

输出应如下所示:

Year                Name           Population        growth
2020             Lucknow                 3704     1.0000000
2020               Delhi                29274     1.0000000
2020           Hyderabad                10275     1.0000000
2030             Lucknow               4074.4     1.1000000
2030               Delhi              32201.4     1.1000000
2030           Hyderabad              11302.5     1.1000000
....

如何在小标题中填充NA?

我对merge和dplyr :: mutate进行了各种尝试,但是由于这是向量操作,因此无法确定我在这里需要做什么。我很乐意引导正确的命令来执行这样的基本操作。

谢谢!

3 个答案:

答案 0 :(得分:2)

使用dplyr

library(dplyr)
df %>%
  arrange(city, year) %>%
  group_by(city) %>%
  mutate(pop = pop[1] * growth)

# A tibble: 12 x 4
# Groups:   city [3]
    year city           pop growth
   <dbl> <chr>        <dbl>  <dbl>
 1  2020 Delhi       29274    1   
 2  2030 Delhi       32201.   1.1 
 3  2040 Delhi       34543.   1.18
 4  2050 Delhi       36300.   1.24
 5  2020 Hyderadabad 10275    1   
 6  2030 Hyderadabad 11303.   1.1 
 7  2040 Hyderadabad 12124.   1.18
 8  2050 Hyderadabad 12741    1.24
 9  2020 Lucknow      3704    1   
10  2030 Lucknow      4074.   1.1 
11  2040 Lucknow      4371.   1.18
12  2050 Lucknow      4593.   1.24

使用基础R

df <- df[order(df[["city"]], df[["year"]]), ]
df[["pop"]] <-
  unlist(
    lapply(
      unique(df[["city"]]), 
      function(x) with(df[df[["city"]] == x, ], pop[1] * growth)
    )
  )

使用data.table

library(data.table)
setDT(df)[order(city, year), pop := pop[1] * growth, city]

数据:

df <- tibble(
  year   = rep(c(2020, 2030, 2040, 2050), each = 3), 
  city   = rep(c("Lucknow", "Delhi", "Hyderadabad"), times = 4), 
  pop    = c(3704, 29274, 10275, rep(NA, times = 9)), 
  growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3)
)

答案 1 :(得分:1)

基准年是否始终是2020年?如果是,则可以进行以下操作:

library(tidyverse)

df <- tibble( year = rep(c(2020, 2030, 2040, 2050), each = 3), 
              city = rep(c("Lucknow", "Delhi", "Hyderadabad"), times = 4), 
              pop = c(3704, 29274, 10275, rep(NA_integer_, times = 9)), 
              growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) )

uniq <- unique(df$pop)
uniq <- uniq[!is.na(uniq)]

df$pop <- rep(uniq, length(unique(df$year)))

df <- df %>% 
  mutate(pop2 = pop * growth)

答案 2 :(得分:0)

library(tidyverse)
NAME <- c("Lucknow","Delhi","Hyderadabad")
YEAR <- seq(2020,2050,10)
POPULATION=rep(c(3704, 29274,10275),4)
pop_df <- bind_cols(expand.grid(Name=NAME,Year=YEAR),Population=POPULATION)
growth_df <- data.frame(Year=seq(2020,2050,10),growth=c(1,1.1,1.18,1.23))
pop_df <- left_join(pop_df,growth_df) %>%
  mutate(Population=round(Population*growth))