我有很多X和Y变量(类似于500 x 500)。以下只是小数据:
yvars <- data.frame (Yv1 = rnorm(100, 5, 3), Y2 = rnorm (100, 6, 4),
Yv3 = rnorm (100, 14, 3))
xvars <- data.frame (Xv1 = sample (c(1,0, -1), 100, replace = T),
X2 = sample (c(1,0, -1), 100, replace = T),
Xv3 = sample (c(1,0, -1), 100, replace = T),
D = sample (c(1,0, -1), 100, replace = T))
我想要取出p值并制作一个这样的矩阵:
Yv1 Y2 Yv3
Xv1
X2
Xv3
D
以下是我尝试循环过程:
prob = NULL
anova.pmat <- function (x) {
mydata <- data.frame(yvar = yvars[, x], xvars)
for (i in seq(length(xvars))) {
prob[[i]] <- anova(lm(yvar ~ mydata[, i + 1],
data = mydata))$`Pr(>F)`[1]
}
}
sapply (yvars,anova.pmat)
Error in .subset(x, j) : only 0's may be mixed with negative subscripts
What could be the solution ?
修改
对于第一个Y变量:
对于第一个Y变量:
prob <- NULL
mydata <- data.frame(yvar = yvars[, 1], xvars)
for (i in seq(length(xvars))) {
prob[[i]] <- anova(lm(yvar ~ mydata[, i + 1],
data = mydata))$`Pr(>F)`[1]
}
prob
[1] 0.4995179 0.4067040 0.4181571 0.6291167
再次编辑:
for (j in seq(length (yvars))){
prob <- NULL
mydata <- data.frame(yvar = yvars[, j], xvars)
for (i in seq(length(xvars))) {
prob[[i]] <- anova(lm(yvar ~ mydata[, i + 1],
data = mydata))$`Pr(>F)`[1]
}
}
Gives the same result as above !!!
答案 0 :(得分:4)
这是一种方法,使用plyr
循环数据框的列(将其作为列表处理)为每个xvars
和yvars
,返回适当的p-值,将其排列成矩阵。添加行/列名称只是额外的。
library("plyr")
probs <- laply(xvars, function(x) {
laply(yvars, function(y) {
anova(lm(y~x))$`Pr(>F)`[1]
})
})
rownames(probs) <- names(xvars)
colnames(probs) <- names(yvars)
答案 1 :(得分:1)
这是一个解决方案,它包括生成Y和X变量的所有组合以进行测试(我们不能使用combn
)并在每种情况下运行线性模型:
dfrm <- data.frame(y=gl(ncol(yvars), ncol(xvars), labels=names(yvars)),
x=gl(ncol(xvars), 1, labels=names(xvars)), pval=NA)
## little helper function to create formula on the fly
fm <- function(x) as.formula(paste(unlist(x), collapse="~"))
## merge both datasets
full.df <- cbind.data.frame(yvars, xvars)
## apply our LM row-wise
dfrm$pval <- apply(dfrm[,1:2], 1,
function(x) anova(lm(fm(x), full.df))$`Pr(>F)`[1])
## arrange everything in a rectangular matrix of p-values
res <- matrix(dfrm$pval, nc=3, dimnames=list(levels(dfrm$x), levels(dfrm$y)))
旁注:对于高维数据集,依靠QR分解来计算线性回归的p值非常耗时。为每个成对比较计算Pearson线性相关矩阵更容易,并使用关系式F =ν a r 2将r统计量转换为Fisher-Snedecor F /(1-r 2 ),其中自由度定义为ν a =(n-2) - #{(x i = NA),(y i = NA)}(即,(n-2)减去成对缺失值的数量 - 如果没有缺失值,则此公式为通常系数回归中R 2 。