编写我的第一个R程序,我陷入了困境。需要使用线性回归预测2018年至2022年的人口。尝试使用predict()时出错。
这是我到目前为止所拥有的:
X <- c(2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017)
Y <- c(11539282, 11543332, 11546969, 11567845, 11593741, 11606027, 11622554, 11658609)
model.1 <- lm(Y ~ X)
summary(model.1)
plot(X, Y, ylim=c(10000000,13000000))
lines(sort(X), fitted(model.1)[order(X)])
答案 0 :(得分:2)
# create a data frame to store your variables
df <- data.frame(
X = 2010:2022,
Y = c(11539282, 11543332, 11546969, 11567845, 11593741, 11606027, 11622554, 11658609, rep(NA, 5))
)
# check the data frame
df
# X Y
# 1 2010 11539282
# 2 2011 11543332
# 3 2012 11546969
# 4 2013 11567845
# 5 2014 11593741
# 6 2015 11606027
# 7 2016 11622554
# 8 2017 11658609
# 9 2018 NA
# 10 2019 NA
# 11 2020 NA
# 12 2021 NA
# 13 2022 NA
# The lm function in R will exclude the observations with NA values while fitting the model
model.1 <- lm(formula = Y ~ X, data = df)
# get the model summary
summary(model.1)
# broom is an extremely useful package for handling models in R
# install.packages("broom")
# tidy your model and include 95% confidence intervals
broom::tidy(model.1, conf.int = T)
# term estimate std.error statistic p.value conf.low conf.high
# 1 (Intercept) -22799768.60 3272284.123 -6.967539 0.0004342937 -30806759.40 -14792777.80
# 2 X 17077.01 1625.171 10.507824 0.0000436377 13100.36 21053.66
# The model is of the form: Y = - 22799768.60 + 17077.01 * X
# you can get rough predictions for 2018 through 2022 using this formula:
- 22799768.60 + 17077.01 * 2018:2022
# [1] 11661638 11678715 11695792 11712869 11729946
# you can use the predict function as well for precise predictions
# get predictions for every X value
predict(object = model.1, newdata = df)
# 1 2 3 4 5 6 7 8 9 10 11 12 13
# 11525025 11542102 11559179 11576256 11593333 11610410 11627487 11644564 11661641 11678718 11695795 11712872 11729949
# get predictions for 2018 through 2022
predict(object = model.1, newdata = subset(df, X >= 2018))
# 9 10 11 12 13
# 11661641 11678718 11695795 11712872 11729949