我有两个data.frames:editCounts和nonEditCounts。这些结构具有相同的尺寸并包含相同的列和行名称,但实际数据会有所不同。以下是每个人:
> head(editCounts)
Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 4 0 1
chr10_127480585 0 3 0
chr10_16479385 3 3 3
chr10_73979859 0 3 2
chr10_73979940 0 3 8
> head(nonEditCounts)
Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 15 0 4
chr10_127480585 0 6 0
chr10_16479385 7 7 4
chr10_73979859 0 13 7
chr10_73979940 0 21 10
这里的最终目标是在每个data.frames之间的每一列和每行上执行成对渔夫测试(使用fisher.test())。作为输出,我想创建一个表,其中包含与每个行名对应的每个成对比较的结果p值,例如:
Samp0_vs_Samp1 Samp0_vs_Samp2 Samp1_vs_Samp2
chr10_101992307 pval pval pval
chr10_101992684 pval pval pval
chr10_127480585 pval pval pval
chr10_16479385 pval pval pval
chr10_73979859 pval pval pval
... ... ... ...
因此,以Samp0和Samp1为例,第一个Fisher测试将包含一个类似于此的矩阵:
> tempMat=matrix(c(editCounts$ERR188028_GBR[1], nonEditCounts$ERR188028_GBR[1],
+ editCounts$ERR188035_GBR[1], nonEditCounts$ERR188035_GBR[1]), 2, 2)
> tempMat
[,1] [,2]
[1,] 0 4
[2,] 0 4
这些值对应第一行(chr10_101992307)。在这种情况下,Fisher测试将导致p值为1.
我知道我可以使用combn()来计算每个列的排列,但我不确定如何循环每个列,从4个值创建列联表,并运行fisher测试。我到目前为止写的代码如下所示;但是,在尝试创建tempMat时会抛出错误。
editCounts <- read.table("editCountMatrix.txt", sep="\t", header=TRUE, row.names=1)
nonEditCounts <- read.table("nonEditCountMatrix.txt", sep="\t", header=TRUE, row.names=1)
pairwiseComb <- combn(names(editCounts),2)
for (j in seq(1,length(pairwiseComb),2)){
tempCol1 = pairwiseComb[[j]]
tempCol2 = pairwiseComb[[j+1]]
cat("Processing: ",tempCol1," vs. ",tempCol2, "\n", sep="") # Prints correctly
for (i in 1:nrow(editCounts)){
tempMat=matrix(c(editCounts$tempCol1[i], nonEditCounts$tempCol1[i],
editCounts$tempCol2[i], nonEditCounts$tempCol2[i]), 2, 2)
tempFisher=fisher.test(tempMat, alternative="two.sided")
pval=tempFisher$p.value
pvalAdj=p.adjust(pval,method="fdr")
}
}
产生的错误如下所示:
Error in matrix(c(editCounts$tempCol1[i], nonEditCounts$tempCol1[i], editCounts$tempCol2[i], :
'data' must be of a vector type, was 'NULL'
非常感谢任何帮助。
谢谢!
答案 0 :(得分:0)
这是一个建议的解决方案,我已经用你的代码纠正了一些小的索引问题,并建议使用预先分配的矩阵来存储Fisher Exact测试结果。
# Create data.frames using your sample data.
editCounts <- read.table(header=TRUE,
text=" Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 4 0 1
chr10_127480585 0 3 0
chr10_16479385 3 3 3
chr10_73979859 0 3 2
chr10_73979940 0 3 8")
nonEditCounts <- read.table(header=TRUE,
text=" Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 15 0 4
chr10_127480585 0 6 0
chr10_16479385 7 7 4
chr10_73979859 0 13 7
chr10_73979940 0 21 10")
pairwiseComb <- combn(names(editCounts), 2)
# Create a matrix to hold results.
results <- matrix(NA, ncol=ncol(pairwiseComb), nrow=nrow(editCounts))
# Create row and column names to use for indexing/assignment of results.
rownames(results) <- rownames(editCounts)
colnames(results) <- apply(pairwiseComb, 2,
function(x) {paste(x[1], "_vs_", x[2], sep="")})
# Loop over number of column pairs.
for (j in seq(ncol(pairwiseComb))) {
tempCol1 <- pairwiseComb[1, j]
tempCol2 <- pairwiseComb[2, j]
resultsCol <- paste(tempCol1, "_vs_", tempCol2, sep="")
cols <- c(tempCol1, tempCol2)
# Loop over rownames.
for (row in rownames(results)) {
tempMat <- rbind( editCounts[row, cols], # Grab values using row and
nonEditCounts[row, cols]) # column names. Use rbind to
# create two-row matrix.
tempFisher <- fisher.test(tempMat, alternative="two.sided")
results[row, resultsCol] <- tempFisher$p.value # Use row and column name
# indexing to assign
# p-value to results.
}
}
# Compute adjusted p-values using all of the computed p-values, outside of loop.
padj <- results # First make copy of results matrix.
padj[] <- p.adjust(results, method="fdr") # Trick to retain shape and attributes.
results
# Samp0_vs_Samp1 Samp0_vs_Samp2 Samp1_vs_Samp2
# chr10_101992307 1 1.0000000 1.00000000
# chr10_101992684 1 1.0000000 1.00000000
# chr10_127480585 1 1.0000000 1.00000000
# chr10_16479385 1 0.6436652 0.64366516
# chr10_73979859 1 1.0000000 1.00000000
# chr10_73979940 1 1.0000000 0.03290832
padj
# Samp0_vs_Samp1 Samp0_vs_Samp2 Samp1_vs_Samp2
# chr10_101992307 1 1 1.0000000
# chr10_101992684 1 1 1.0000000
# chr10_127480585 1 1 1.0000000
# chr10_16479385 1 1 1.0000000
# chr10_73979859 1 1 1.0000000
# chr10_73979940 1 1 0.5923497