我有庞大的数据集。该数据由371个基因型(以gwas开头)和105000个标记组成。我需要在R中的基因型中使用105000标记具有特定数学方程的矩阵。数据格式如下
markers gwas_100 gwas_101 gwas_102 gwas_103
S1_147748 NA NA NA NA
S1_239131 0.67385 0.67385 0.67385 0.67385
S1_644966 0.61051 0.61051 0.61051 0.61051
S1_1625764 NA 0.71429 NA 0.71429
S1_1761929 0.69137 0.69137 0.69137 0.69137
S1_1778021 0.72372 0.72372 0.72372 0.72372
S1_1778059 0.72507 0.72507 0.72507 0.72507
S1_1778136 0.68733 0.68733 0.68733 0.68733
S1_1778289 0.69946 0.69946 0.69946 0.69946
S1_1780669 0.73046 0.73046 0.73046 0.73046
S1_1786636 0.71563 0.71563 0.71563 0.71563
S1_1786639 0.71833 0.71833 0.71833 0.71833
S1_1786640 0.71294 0.71294 0.71294 0.71294
S1_1786678 0.71429 0.71429 0.71429 0.71429
S1_1963487 0.72776 0.72776 0.72776 0.72776
S1_2036329 0.74259 0.74259 0.74259 0.74259
S1_2036386 0.74394 0.74394 0.74394 0.74394
S1_2037735 0.7628 0.7628 0.7628 0.7628
S1_2037760 0.7628 0.7628 0.7628 0.7628
S1_2037773 0.7628 0.7628 0.7628 0.7628
S1_2042132 0.58491 NA NA NA
数学方程式
(gwas_100 & gwas_101) = Sum (gwas100) - sum (gwas_101), where
sum gwas_100 = 0.67385 + 0.61051 + 0.69137.....+0.58491)
sum gwas_101 = 0.67385 + 0.61051+ ....... 0.7228), therefore
(gwas_100 & gwas_101) = 13.4905 - 13.61994 = -0.12938
然后我需要在每两个之间获得矩阵,并且需要371个基因型的所有可能组合 像一个例子
gwas_100 gwas101 gwas_102 gwas_103
gwas_100 -0.12 0.14 0.05
gwas_101 0.06 0.1
gwwas_102 0.07
gwas_103
提前致谢
答案 0 :(得分:1)
您可以先使用colSums
汇总忽略NA
的列,然后使用outer
按对将它们减去:
sums <- colSums(data[-1], na.rm=TRUE)
outer(sums,sums,`-`)
gwas_100 gwas_101 gwas_102 gwas_103
gwas_100 0.00000 -0.12938 0.58491 -0.12938
gwas_101 0.12938 0.00000 0.71429 0.00000
gwas_102 -0.58491 -0.71429 0.00000 -0.71429
gwas_103 0.12938 0.00000 0.71429 0.00000