Question

所以我有一个非常大的数据框，在这个例子中就这样布局了：

line    gene1    gene2    gene3    gene4    gene5  survival
1       4.05     7.65     0.25     0.789    10.5   0.90
2       2.51     4.36     12.5     7.56     8.99   0.50
3       3.65     2.55     48.8     5.65     5.89   0.25   
4       5.65     1.54     8.99     9.2      0.01   0.10

唯一的区别是我在实际数据中处理超过18,000个基因。 line指的是果蝇的遗传系，而每个基因的数字是＃34}。列是指相对基因表达。 survival是每行中生存的比例。我想要做的是将第2列到第5列（基因表达式）与第6列（survival）相关联。我已经使用cor尝试了这个并且它运行正常：

cor1<-cor(master2[c(2:5)], master2$surv, method="pearson")

但是，我希望使用cor.test或corr.test（来自psych包）来输出p值并对其进行一些更正。

我试过了：

cor1<-cor.test(master2[c(2:5)], master2$surv, method="pearson")

并获得：

Error in cor.test.default(master2[c(2:5)], master2$surv, method = "pearson") : 
'x' and 'y' must have the same length

我也尝试过：

cor1<-corr.test(master2[c(2:18141)], master2$surv, method="pearson")

得到：

Error in 1:ncol(y) : argument of length 0

任何帮助都将非常感谢!!!

提前致谢，

菲尔

Answer 1

首先，根据我使用大型基因表达数据集进行类似操作的经验，psych::corr.test() 远优越，特别是对于矩阵矩阵或df乘以df。

psych::corr.test()的优势也是您收到该错误的原因。这两个输入必须都是矩阵或数据帧 - 当您使用master2$surv提取一列时，它不再是数据帧！尝试使用master2[,ncol(master2)]提取最后一列。

编辑：

您也可以使用cor.test，但您需要在第一个参数中感兴趣的列sapply 2:5，例如pysch::corr.test。性能明智如果它是4列可能没什么大不了的，但如果它是数千，我会建议import operator def natural_binary_operators(cls): for name, op in { '__add__': operator.add, '__sub__': operator.sub, '__mul__': operator.mul, '__truediv__': operator.truediv, '__floordiv__': operator.floordiv, '__and__': operator.and_, '__or__': operator.or_, '__xor__': operator.xor }.items(): setattr(cls, name, cls._make_binop(op)) return cls @natural_binary_operators class Vector(tuple): @classmethod def _make_binop(cls, operator): def binop(self, other): try: return cls([operator(a, x) for a, x in zip(self, other)]) except: return cls([operator(a, other) for a in self]) return binop。

Answer 2

以下是如何实施上述apply的方法。首先将变量子集作为矩阵：

mat <- df[, 2:5]
survival <- df[, 6]

现在在cor.test()：

的列中应用mat

cor <- apply(mat, 2, function(x) cor.test(survival, x))

用

提取相关系数

unlist(sapply(cor, "[", 4))

在一台体面的机器上应该可以轻松地完成18,000个变量。

R cor.test，corr.test或corr？

2 个答案: