我有关于生活在弗罗茨瓦夫的人数的数据:
Pop <- data.frame(Year = c(850, 1000, 1200, 1300, 1350, 1318, 1327, 1329),
Pop = c(800, 2250, 5000, 13500, 14000, 13600, 12000, 15950))
有没有办法改变它,每年都是单独的行,数据是插值的?
Pop_long <- data.frame(Year = 850:1329, Pop = 850, ....)
插值是线性的。我已经做过了,但我打赌有更好的方法:
Pop <- Pop
mutate(Year_lead = lead(Year),
Pop_lead = lead(Pop),
Year_diff = Year_lead - Year,
Pop_diff = Pop_lead - Pop,
Pop_add = Pop_diff / Year_diff) %>%
select(Year, Pop, Pop_add)
Pop_long <- data.frame(Year = 850:1329) %>%
merge(Pop, all.x = T)
for(i in 1:nrow(Pop_long)){
if(is.na(Pop_long[i, "Pop"])) {
Pop_long[i, "Pop"] <- Pop_long[i - 1, "Pop_add"] + Pop_long[i - 1, "Pop"]
Pop_long[i, "Pop_add"] <- Pop_long[i - 1, "Pop_add"]
}
}
答案 0 :(得分:3)
您可以使用complete
中的tidyr
和na.approx
中的zoo
。
library(tidyr)
library(dplyr)
library(zoo)
Pop_long <- Pop %>%
complete(., Year = 850:1329) %>%
# complete(., Year = min(Year):max(Year)) %>%
mutate(Pop = na.approx(Pop))
Pop_long
# A tibble: 480 x 2
# Year Pop
# <dbl> <dbl>
# 1 850. 800.
# 2 851. 810.
# 3 852. 819.
# 4 853. 829.
# 5 854. 839.
# 6 855. 848.
# 7 856. 858.
# 8 857. 868.
# 9 858. 877.
#10 859. 887.
# ... with 470 more rows
library(ggplot2)
ggplot(data = Pop_long, aes(Year, Pop)) +
geom_line() +
geom_point(data = Pop, col = "red")