Question

我有两个数据集：尺寸为53 * 17237的“ datExprSTLMS”和尺寸为99 * 22144的“ datExprSTF”。在两个数据集中，某些列（gene_names）是通用的。基于在两个数据集的姓之间使用match（），我建立了15711（TRUE）gene_name作为它们之间的相交基因。现在，我想提供“ datExprSTLMS”的子集，以便“ datExprSTLMS”的尺寸为53 * 15711。为此，我编写了以下代码：

 dim(datExprSTF)
 #[1]    99 22144

 dim(datExprSTLMS)
 #[1]    53 17237

 TCGA2STF <- match(colnames(datExprSTLMS), colnames(datExprSTF))
 table(is.finite(TCGA2STF))
 #FALSE  TRUE 
 #1526  15711 

 #delete NA(mismatch gene_names which in my case are 1526)
 TCGA2STF_final <- Filter(function(x)!all(is.na(x)), TCGA2STF)

 datExprSTLMS_final <- as.data.frame(datExprSTLMS[,TCGA2STF_final])

但是运行我的代码的最后一行后，我得到下面的错误提示：

 Error in datExprSTLMS[, TCGA2STF_final] : subscript out of bounds

我用R语言编写代码。我需要指导

Answer 1

我们可以使用intersect查找两个数据集之间的公共列，然后将它们用作datExprSTLMS的子集

datExprSTLMS[, intersect(colnames(datExprSTLMS), colnames(datExprSTF))]

如何找到R中2个数据集之间的公共列？

1 个答案: