数据来自:http://www.principlesofeconometrics.com/poe5/poe5rdata.html,在文件:collegtown.csv
对数线性模型的形式为:ln(y) = b1 + b2x
library(ggthemes)
library(ggplot2)
theUrl <- "../poedata/collegetown.csv"
collegetown <- read.csv(theUrl)
g1 <- ggplot(data = collegetown, aes(x = sqft, y = price))+
geom_point(col = "blue")
plot(g1)
logLinearModel <- lm(log(price)~sqft, data = collegetown)
g1 + geom_smooth(method = "lm", formula = y ~ exp(x), se = F, col = "green")+
theme_economist()
summary(logLinearModel)
如何绘制正确的曲线?我是否需要在数据框中明确存储预测值?
PS:我希望轴保持不变,即保持原始比例。
答案 0 :(得分:1)
模型 y~exp(x)
与模型 log(y)~x
不同,因此您没有获得预期的平滑度。您可以使用以下代码指定平滑器是具有对数链接函数的广义线性模型:
g1 <- ggplot(data = collegetown, aes(x = sqft, y = price))+
geom_point(col = "blue")
g1 + geom_smooth(method = "glm", formula = y ~ x, se = F, col = "green",
method.args = list(family=gaussian(link="log"))) +
theme_economist()
这给了你想要的。如果这看起来不直观,您可以使用以下方法将 lm 拟合到绘图之外:
logLinearModel <- lm(log(price)~sqft, data = collegetown)
collegetown$pred <- exp(predict(logLinearModel))
ggplot(data = collegetown, aes(x = sqft, y = price))+
geom_point(col = "blue") +
geom_line(aes(y=pred), col = "green")+
theme_economist()
警告 - 如果您想要标准错误,这两个版本是不一样的;第一种方法给出了对称误差,您可能从 lm 预测中得到的标准误差在对数尺度上是对称的。 See here。
答案 1 :(得分:0)
我认为构建曲线的一种相对简单的方法是使用 stat_function()
方法。
# LOG LINEAR MODEL
logLinearModel <- lm(log(price)~sqft, data = collegetown)
smodloglinear <- summary(logLinearModel)
logLinearModel
names(logLinearModel)
yn <- exp(logLinearModel$fitted.values)
rgloglinear <- cor(yn, collegetown$price)
rgloglinear^2
b1 <- coef(smod)[[1]]
b2 <- coef(smod)[[2]]
sighat2 <- smod$sigma^2
g2 <- ggplot(data = collegetown,aes(x = sqft, y = price))+
geom_point(col = "white") +
stat_function(fun = function(x){exp(b1+b2*x)}, aes(color = "red"))+
stat_function(fun = function(x){exp(b1+b2*x+sighat2/2)} , aes(color = "green"))+
dark_theme_bw()+
scale_color_identity(name = "Model fit",
breaks = c("red", "green"),
labels = c("yn", "yc"),
guide = "legend")
g2