我正在尝试使用glm
在R中实现bin-smooth(我也看到它称为步进函数或回归图)。只要没有太多的垃圾箱,它就能完美运行。我首先尝试了这个,但我不能用超过15个箱子来预测它:
binsmoothfit <- glm(mpg ~ cut(displacement, 20), data=auto)
predict(binsmoothfit, data.frame(displacement=min(displacement):max(displacement)))
#Error in model.frame.default(Terms, newdata, na.action = na.action,
#xlev =object$xlevels) :factor cut(displacement, 20) has new levels (203,223],
#(281,300],(320,339]
我想这是因为cut函数给出的一些剪切是空的:
table(cut(displacement,20))
#(67.6,87] (87,106] (106,126] (126,145] (145,165] (165,184] (184,203] (203,223]
# 30 77 58 31 22 9 13 0
#(223,242] (242,262] (262,281] (281,300] (300,320] (320,339] (339,358] (358,378]
# 32 25 3 0 42 0 27 4
#(378,397] (397,417] (417,436] (436,455]
# 3 13 3 6
所以我尝试使用分位数,但这也行不通。虽然我不太清楚为什么我们在每个剪辑中都有一些数据点:
binsmoothfit <- glm(mpg ~ cut(displacement, breaks=quantile(displacement,
probs=seq(0,1,1.0/20))), data=auto)
predict(binsmoothfit, data.frame(displacement=min(displacement):max(displacement)))
# Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
# object$xlevels) : factor cut(displacement, breaks = quantile(displacement, probs =
#seq(0, 1, 1/20))) has new levels (68,87.3], (87.3,107], (107,126], (126,145],
#(145,165], (165,184], (184,203], (203,223], (223,242], (242,262], (262,281],
#(281,300], (300,320], (320,339], (339,358], (358,378], (378,397], (397,416],
#(416,436], (436,455]
table(cut(displacement,breaks=quantile(displacement, probs=seq(0,1,1.0/20))))
#(68,85] (85,90] (90,97] (97,98] (98,104] (104,112] (112,120] (120,122]
#25 18 34 19 3 23 24 18
#(122,140] (140,148] (148,168] (168,200] (200,231] (231,250] (250,262] (262,305]
# 27 7 22 19 21 28 10 19
#(305,318] (318,350] (350,400] (400,455]
# 24 19 28 9
有谁知道该怎么办?是否有一种很好的方法将间隔合并在一起而没有任何数据点?还是有另一种方法吗?什么“因素有新的水平”意味着什么?我真的很想使用glm
,因为我可以自动访问预测,交叉验证等。
我使用的数据是来自UCI机器学习库的Auto MPG数据集:
auto <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data",
col.names = c("mpg", "cylinders", "displacement", "horsepower", "weight",
"acceleration", "model_year", "origin", "car_name"))
attach(auto)