我正在尝试使用列表来构建缺失值的预测,然后将这些缺失的值写回列表中。我对预测感到满意,但在此之后卡住了 - 如何将新发现的值写回my_list?
#my_list is a list with cars, some are missing MPG
# These cars have no MPG
empty_rows <- subset(my_list, cartable.mpg=='0')
#These have an MPG, we'll use them to build our model
usable_rows <- subset(my_list, cartable.mpg !='0')
#Do a regression based on mpg,cylinders and weight
fitted_lm = lm(as.numeric(cartable.mpg) ~ as.numeric(cartable.cyl)+as.numeric(cartable.wt), usable_rows)
#Predict the missing rows
filled_rows <- predict(fitted_lm, empty_rows)
答案 0 :(得分:1)
由于您没有提供任何可重现的最小数据集,因此以下是使用mtcars
的示例。
简而言之,我将mtcars
拆分为训练数据集(用于模型构建),以及已删除响应变量的测试数据集(在本例中为mpg
)。然后,我构建了一个线性模型lm(mpg ~ wt)
,并使用该模型预测测试数据集的mpg
。
# Training sample is half the full sample
# Set fixed RNG seed for reproducibility
set.seed(2017);
idx <- sample(nrow(mtcars) / 2);
# Training sample to build the model
df.train <- mtcars[idx, ];
# Test sample without response variable in column 1
df.test <- mtcars[-idx, -1];
# Linear model
fit <- lm(mpg ~ wt, data = df.train);
# Prediction for test sample
pred <- predict(fit, df.test);
df.test <- cbind.data.frame(
mpg = pred,
df.test);
# Bind data for training and test sample and flag which one is which
df <- rbind.data.frame(
cbind.data.frame(df.train, train = TRUE),
cbind.data.frame(df.test, train = FALSE));
df[, c("mpg", "wt", "train")];
# mpg wt train
#Cadillac Fleetwood 10.40000 5.250 TRUE
#Merc 230 22.80000 3.150 TRUE
#Duster 360 14.30000 3.570 TRUE
#Hornet 4 Drive 21.40000 3.215 TRUE
#Merc 280 19.20000 3.440 TRUE
#Lincoln Continental 10.40000 5.424 TRUE
#Mazda RX4 21.00000 2.620 TRUE
#Merc 450SL 17.30000 3.730 TRUE
#Merc 280C 17.80000 3.440 TRUE
#Mazda RX4 Wag 21.00000 2.875 TRUE
#Hornet Sportabout 18.70000 3.440 TRUE
#Merc 450SE 16.40000 4.070 TRUE
#Valiant 18.10000 3.460 TRUE
#Merc 450SLC 15.20000 3.780 TRUE
#Merc 240D 24.40000 3.190 TRUE
#Datsun 710 22.80000 2.320 TRUE
#Chrysler Imperial 10.17314 5.345 FALSE
#Fiat 128 24.32264 2.200 FALSE
#Honda Civic 26.95458 1.615 FALSE
#Toyota Corolla 25.96479 1.835 FALSE
#Toyota Corona 23.13039 2.465 FALSE
#Dodge Challenger 18.38390 3.520 FALSE
#AMC Javelin 18.76632 3.435 FALSE
#Camaro Z28 16.94420 3.840 FALSE
#Pontiac Firebird 16.92171 3.845 FALSE
#Fiat X1-9 25.51488 1.935 FALSE
#Porsche 914-2 24.59258 2.140 FALSE
#Lotus Europa 27.41348 1.513 FALSE
#Ford Pantera L 19.95856 3.170 FALSE
#Ferrari Dino 21.75818 2.770 FALSE
#Maserati Bora 18.15895 3.570 FALSE
#Volvo 142E 21.71319 2.780 FALSE