数据转换 - 从列转换数据并插入缺失值

时间:2018-06-09 20:45:55

标签: r interpolation data-transform

我有关于生活在弗罗茨瓦夫的人数的数据:

Pop <- data.frame(Year = c(850, 1000, 1200,  1300,  1350,  1318,  1327,  1329), 
                    Pop  = c(800, 2250, 5000, 13500, 14000, 13600, 12000, 15950))

有没有办法改变它,每年都是单独的行,数据是插值的?

Pop_long <- data.frame(Year = 850:1329, Pop = 850, ....)

插值是线性的。我已经做过了,但我打赌有更好的方法:

Pop <- Pop
  mutate(Year_lead = lead(Year),
         Pop_lead  = lead(Pop),
         Year_diff = Year_lead - Year,
         Pop_diff  = Pop_lead  - Pop,
         Pop_add   = Pop_diff / Year_diff) %>%
  select(Year, Pop, Pop_add) 

Pop_long <- data.frame(Year = 850:1329) %>%
  merge(Pop, all.x = T)    

for(i in 1:nrow(Pop_long)){ 
  if(is.na(Pop_long[i, "Pop"])) {
    Pop_long[i, "Pop"]     <- Pop_long[i - 1, "Pop_add"] + Pop_long[i - 1, "Pop"] 
    Pop_long[i, "Pop_add"] <- Pop_long[i - 1, "Pop_add"] 
  }
}

1 个答案:

答案 0 :(得分:3)

您可以使用complete中的tidyrna.approx中的zoo

library(tidyr)
library(dplyr)
library(zoo)
Pop_long <- Pop %>% 
 complete(., Year = 850:1329) %>% 
 # complete(., Year = min(Year):max(Year)) %>%
 mutate(Pop = na.approx(Pop))

Pop_long
# A tibble: 480 x 2
#    Year   Pop
#   <dbl> <dbl>
# 1  850.  800.
# 2  851.  810.
# 3  852.  819.
# 4  853.  829.
# 5  854.  839.
# 6  855.  848.
# 7  856.  858.
# 8  857.  868.
# 9  858.  877.
#10  859.  887.
# ... with 470 more rows
library(ggplot2)
ggplot(data = Pop_long, aes(Year, Pop)) + 
  geom_line() +
  geom_point(data = Pop, col = "red")

enter image description here