DF <- data.frame(factor1=rep(1:4,1000), factor2 = rep(1:4,each=1000),base = rnorm(4000,0,1),dep=rnorm(4000,400,5))
DF$f1_1 = DF$factor1 == 1
DF$f1_2 = DF$factor1 == 2
DF$f1_3 = DF$factor1 == 3
DF$f1_4 = DF$factor1 == 4
DF$f2_1 = DF$factor2 == 1
DF$f2_2 = DF$factor2 == 2
DF$f2_3 = DF$factor2 == 3
DF$f2_4 = DF$factor2 == 4
我想运行以下回归:
Dep = (f1_1 + f1_2 + f1_3 + f1_4)*(f2_1 + f2_2 + f2_3 + f2_4)*(base+base^2+base^3+base^4+base^5)
有更聪明的方法吗?
答案 0 :(得分:2)
您应该将factor1
和factor2
编码为实际因素变量。此外,最好将poly
用于多项式。以下是我们可以做的事情:
DF <- data.frame(factor1=rep(1:4,1000), factor2 = rep(1:4,each=1000),
base = rnorm(4000,0,1), dep = rnorm(4000,400,5))
DF$factor1 <- as.factor(DF$factor1)
DF$factor2 <- as.factor(DF$factor2)
fit <- lm(dep ~ factor1 * factor2 * poly(base, degree = 5))
默认情况下,poly
为数值稳定性生成正交基础。如果您想要base + base ^ 2 + base ^ 3 + ...
之类的普通多项式,请使用poly(base, degree = 5, raw = TRUE)
。
请注意,您将从此模型中获得大量参数,因为您为factor1
和factor2
之间的每对级别拟合了一个五阶多项式。
考虑一个小例子。
set.seed(0)
f1 <- sample(gl(3, 20, labels = letters[1:3])) ## randomized balanced factor
f2 <- sample(gl(3, 20, labels = LETTERS[1:3])) ## randomized balanced factor
x <- runif(3 * 20) ## numerical covariate
y <- rnorm(3 * 20) ## toy response
fit <- lm(y ~ f1 * f2 * poly(x, 2))
#Call:
#lm(formula = y ~ f1 * f2 * poly(x, 2))
#
#Coefficients:
# (Intercept) f1b f1c
# -0.5387 0.8776 0.1572
# f2B f2C poly(x, 2)1
# 0.5113 1.0139 5.8345
# poly(x, 2)2 f1b:f2B f1c:f2B
# 2.4373 1.0666 0.1372
# f1b:f2C f1c:f2C f1b:poly(x, 2)1
# -1.4951 -1.4601 -6.2338
# f1c:poly(x, 2)1 f1b:poly(x, 2)2 f1c:poly(x, 2)2
# -11.0760 -2.3668 1.9708
# f2B:poly(x, 2)1 f2C:poly(x, 2)1 f2B:poly(x, 2)2
# -3.7127 -5.8253 5.6227
# f2C:poly(x, 2)2 f1b:f2B:poly(x, 2)1 f1c:f2B:poly(x, 2)1
# -7.3582 20.9179 11.6270
#f1b:f2C:poly(x, 2)1 f1c:f2C:poly(x, 2)1 f1b:f2B:poly(x, 2)2
# 1.2897 11.2041 12.8096
#f1c:f2B:poly(x, 2)2 f1b:f2C:poly(x, 2)2 f1c:f2C:poly(x, 2)2
# -9.8476 10.6664 4.5582
注意,即使对于每个3因子水平和3阶多项式,我们已经得到了大量的系数。
答案 1 :(得分:0)
使用I()强制公式将+ - ×/视为算术而不是模型运算符。示例:lm(y~I(x1 + x2))