我正在努力调整一些(对我而言)非常复杂的代码来处理我的数据。
我认为我的问题的症结在于,当我开始使用二维矩阵时,我的一些变量会失去维度,我需要知道如何使变量保持其维度。
我从两个变量开始,e
(data.frame),其中一部分如下所示:
e <-
structure(list(X2hr = c(0.106, 0, 0, 0, 0.01, 0.042), X6hr = c(1,
0.083, 0.006, 0, 1, 0.967), X12hr = c(0.049, 0.057, 0.098, 0.405,
0.046, 0.029), X24hr = c(0.264, 0.301, 0.025, 0.15, 0.58, 0.487
), X36hr = c(0.284, 1, 0.114, 1, 0.671, 1), X48hr = c(0.274,
0.235, 0.299, 0.253, 0.617, 0.636), X72hr = c(0.098, 0.021, 1,
0.325, 0.283, 0.35)), .Names = c("X2hr", "X6hr", "X12hr", "X24hr",
"X36hr", "X48hr", "X72hr"), row.names = c("cgd1_10", "cgd1_100",
"cgd1_1000", "cgd1_1010", "cgd1_1020", "cgd1_1030"), class = "data.frame")
和m
(一个二维矩阵,有一列和2913行),其中一部分如下所示:
m <-
structure(c(0, 0, 1.174805088, 1.174805088, 0, 0), .Dim = c(6L,
1L), .Dimnames = list(c("cgd1_10", "cgd1_100", "cgd1_1000", "cgd1_1010",
"cgd1_1020", "cgd1_1030"), "X4_1110_2.motif2"))
我加载glmnet包定义了两个函数IDC.glmnet
和PBM.glmnet.getCoefs
:
library(glmnet)
IDC.glmnet <- function(e, m, mode="coef", randomize=F, alpha=0.5) {
nona <- !is.na(e)
enona <- e[nona]
mnona <- m[nona,]
if(ncol(m)==1)
dim(mnona) <- c(sum(nona),ncol=1)
e.cv <- cv.glmnet( mnona, enona, nfolds=10)
l <- e.cv$lambda.min
#print(l)
if (randomize == TRUE) {
enona <- sample(enona)
}
e.fits <- glmnet( mnona, enona, family="gaussian", alpha=alpha, nlambda=100)
if (mode == "predict") {
cor.test(predict(e.fits, mnona, type="response", s=l), enona)$estimate
} else {
as.matrix(predict(e.fits, s=l, type="coefficients")[-1,])
}
}
PBM.glmnet.getCoefs <- function(e, m, alpha=0.05, randomize=F, center=FALSE) {
e.coef <<- apply(e, 2, IDC.glmnet, m, mode="coefficients",
alpha=alpha, randomize=randomize)
if (dim(e)[2] > 1) {
e.coef.s <- t(apply(e.coef, 1, scale, center=center))
} else {
e.coef.s <- e.coef
}
rownames(e.coef.s) <- colnames(m)
colnames(e.coef.s) <- colnames(e)
e.coef.s
}
然后我尝试对我的变量执行PBM.glmnet.getCoefs
:
coefs <- PBM.glmnet.getCoefs(e, m)
我收到以下错误消息:
Error in t(apply(e.coef, 1, scale, center = center)) :
error in evaluating the argument 'x' in selecting a method for function 't':
Error in apply(e.coef, 1, scale, center = center) :
dim(X) must have a positive length
当我为m
使用单列矩阵时会出现问题。如果我有多列,它工作正常。但是我不能使用多个列,因为它会使结果出现偏差,我真的需要能够使用单列m
。从我有限的故障排除能力来看,我认为PBM.glmnet.getCoefs
功能中的这一行是故障的开始:
e.coef <<- apply(e, 2, IDC.glmnet, m, mode="coefficients",
alpha=alpha, randomize=randomize)
当我使用单列e.coef
时, m
是一个向量。然后由于e.coef
是无量纲的,我会在上面列出的t(apply)
中收到错误。
e.coef
看起来像这样:
> e.coef
X2hr X6hr X12hr X24hr X36hr X48hr
0.025701875 0.004066947 0.043836383 0.020151361 0.003512643 -0.035211133
X72hr
-0.034503722
如何确保e.coef
保留正确的尺寸(1行7列,从e
的顶行取的列标题,在IDC.glmnet
函数中确定的行值)?
答案 0 :(得分:1)
您正确识别了导致问题的行。问题在?apply
的 Value 部分中描述:“'apply'如果'MARGIN'的长度为1则返回一个向量。”
所以进行这个小改动以确保尺寸正确:
PBM.glmnet.getCoefs <-
function(e, m, alpha=0.05, randomize=F, center=FALSE ) {
e.coef <<- apply(e, 2, IDC.glmnet, m, mode="coefficients",
alpha=alpha, randomize=randomize)
dim(e.coef) <<- c(ncol(m), ncol(e))
if (dim(e)[2] > 1) {
e.coef.s <- t(apply(e.coef, 1, scale, center=center))
} else {
e.coef.s <- e.coef
}
rownames(e.coef.s) <- colnames(m)
colnames(e.coef.s) <- colnames(e)
e.coef.s
}