Question

这是我的数据集：

如您所见，有两个定量变量（X，Y）和1个分类变量（摩尔，有两个因素：M1，M2）。

我想在单个图中表示两个多项式回归及其各自的预测区间：一个用于M1因子，一个用于M2因子。每个多项式回归都有自己的度数（M1是4次多项式回归，M2是6度）。

我想使用ggplot（）函数（在R中的包ggplot2中）。我实际上已经执行了这个数字，但所有数据都合并了（我的意思是，没有因素之间的区别）。这是我使用的代码：

# Fit a linear model
m <- lm(Y ~ X+I(X^2)+I(X^3)+I(X^4), data = Dataset)
# cbind the predictions to Dataset
mpi <- cbind(Dataset, predict(m, interval = "prediction"))

ggplot(mpi, aes(x = X)) +
geom_ribbon(aes(ymin = lwr, ymax = upr),
fill = "blue", alpha = 0.2) +
geom_point(aes(y = Y)) +
geom_line(aes(y = fit), colour = "blue", size = 1)

有了这个结果：

所以，我希望有两个不同等级的多项式回归（一个用于M1，一个用于M2），考虑到它们各自的预测间隔。哪个是确切的代码？

更新 - 新代码！我运行此代码没有成功：

M1=subset(Dataset,Dataset$molar=="M1",select=X:Y)
M2=subset(Dataset,Dataset$molar=="M2",select=X:Y)

M1.R <- lm(Y ~ X +I(X^2)+I(X^3)+I(X^4), 
data=subset(Dataset,Dataset$molar=="M1",select=X:Y))
M2.R <- lm(Y ~ X +I(X^2)+I(X^3)+I(X^4), 
data=subset(Dataset,Dataset$molar=="M2",select=X:Y))


newdf <- data.frame(x = seq(0, 1, c(408,663)))

M1.P <- cbind(data=subset(Dataset,Dataset$molar=="M1",select=X:Y), predict(M1.R, interval = "prediction"))
M2.P <- cbind(data=subset(Dataset,Dataset$molar=="M2",select=X:Y), predict(M2.R, interval = "prediction"))

p = cbind(as.data.frame(rbind(M1.P, M2.P)), f = factor(rep(1:2, c(408,663)), x = rep(newdf$x, 2))

mdf = with(Dataset, data.frame(x = rep(x, 2), y = c(subset(Dataset,Dataset$molar=="M1",select=Y), subset(Dataset,Dataset$molar=="M2",select=Y),
                   f = factor(rep(1:2, c(408,663))))


ggplot(mdf, aes(x = x, y = y, colour = f)) + geom_point() +
geom_ribbon(data = p, aes(x = x, ymin = lwr, ymax = upr,
                    fill = f, y = NULL, colour = NULL),
      alpha = 0.2) +

geom_line(data = p, aes(x = x, y = fit))

这些是我现在收到的消息：

[98] WARNING: Warning in if (n < 0L) stop("wrong sign in 'by' argument") :
the condition has length > 1 and only the first element will be used
Warning in if (n > .Machine$integer.max) stop("'by' argument is much too small") :
the condition has length > 1 and only the first element will be used
Warning in 0L:n :
numerical expression has 2 elements: only the first used
Warning in if (by > 0) pmin(x, to) else pmax(x, to) :
the condition has length > 1 and only the first element will be used
[99] WARNING: Warning in predict.lm(M1.R, interval = "prediction") :
predictions on current data refer to _future_ responses
[100] WARNING: Warning in predict.lm(M2.R, interval = "prediction") :
predictions on current data refer to _future_ responses
[101] ERROR: <text>

我认为我更接近但仍然无法看到它。救命啊！

Answer 1

这是一种方法。如果您在因子中有两个以上的模型/级别，那么您应该查看能够在因子级别上工作的代码并按照这种方式拟合模型。

无论如何，首先是一些虚拟数据：

set.seed(100)
x <- runif(100)
y1 <- 2 + (0.3 * x) + (2.4 * x^2) + (-2.5 * x^3) + (3.4 * x^4) + rnorm(100)
y2 <- -1 + (0.3 * x) + (2.4 * x^2) + (-2.5 * x^3) + (3.4 * x^4) +
  (-0.3 * x^5) + (2.4 * x^6) + rnorm(100)
df <- data.frame(x, y1, y2)

适合我们的两个模型：

m1 <- lm(y1 ~ poly(x, 4), data = df)
m2 <- lm(y2 ~ poly(x, 6), data = df)

现在预测某些新位置x并将其与x和f（将索引模型的因素）整合在一起，形成整齐的格式：

newdf <- data.frame(x = seq(0, 1, length = 100))
p1 <- predict(m1, newdata = newdf, interval = "prediction")
p2 <- predict(m2, newdata = newdf, interval = "prediction")
p <- cbind(as.data.frame(rbind(p1, p2)), f = factor(rep(1:2, each = 100)),
           x = rep(newdf$x, 2))

将原始数据融化为整洁的形式

mdf <- with(df, data.frame(x = rep(x, 2), y = c(y1, y2),
                           f = factor(rep(1:2, each = 100))))

绘制图，使用颜色区分模型/数据

ggplot(mdf, aes(x = x, y = y, colour = f)) +
  geom_point() +
  geom_ribbon(data = p, aes(x = x, ymin = lwr, ymax = upr,
                            fill = f, y = NULL, colour = NULL),
              alpha = 0.2) +
  geom_line(data = p, aes(x = x, y = fit))

这让我们

ggplot（）图中的2个多项式回归

1 个答案: