计算Tanimoto系数

时间:2013-12-10 04:25:04

标签: mysql r

我想计算疾病对的Tanimoto系数(集合/联合的交集)。样本数据如下,仅针对1对疾病。 其中疾病1是NK细胞缺陷,疾病2是腺苷酸琥珀酸裂解酶缺乏症。

第1组是疾病1(NK细胞缺陷),其具有来自Gene1列的所有基因。

第2组是疾病2(腺苷酸琥珀酸裂解酶缺陷),它具有来自Gene2列的所有基因。

**Gene1** **Gene2**  **Disease1**   **Disease2**
IMPDH1  XDH NK cell defects Adenylosuccinate lyase deficiency
PPP3R2  ADA NK cell defects Adenylosuccinate lyase deficiency
PPP3R2  NPR1    NK cell defects Adenylosuccinate lyase deficiency
PPP3R2  IMPDH1  NK cell defects Adenylosuccinate lyase deficiency
PPP3R2  IMPDH2  NK cell defects Adenylosuccinate lyase deficiency
PPP3R2  PPP3R2  NK cell defects Adenylosuccinate lyase deficiency
PPP3R2  RRM1    NK cell defects Adenylosuccinate lyase deficiency
NPR1    POLA1   NK cell defects Adenylosuccinate lyase deficiency
PPP3R2  ITGAL   NK cell defects Adenylosuccinate lyase deficiency
ITGAL   NPR1    NK cell defects Adenylosuccinate lyase deficiency
CASP3   NPR1    NK cell defects Adenylosuccinate lyase deficiency
PTK2B   NPR1    NK cell defects Adenylosuccinate lyase deficiency
TNF GUCY1A2 NK cell defects Adenylosuccinate lyase deficiency
PTK2B   GUCY1A2 NK cell defects Adenylosuccinate lyase deficiency

有关如何在MySQL或R

中执行此操作的任何建议

谢谢,

罗汉

2 个答案:

答案 0 :(得分:0)

学习搜索:

 install.packages("sos")
 library("sos")
 findFn("Tanimoto")

getGeneSim {GOSim} R文档

计算基因的功能相似性

描述

使用不同的策略计算基因列表的成对功能相似性。 使用

getGeneSim(genelist1, genelist2=NULL, similarity="funSimMax", similarityTerm="relevance", 
           normalization="Tanimoto", method="sqrt", avg=(similarity=="OA"), verbose=FALSE)

答案 1 :(得分:0)

随机输入数据 -

library(data.table)

DT = data.table(
  G1=1:5,
  G2=3:7,
  D1="A",
  D2="B"
  )

DT[,
   list(
     intersectG = length(intersect(G1,G2)),
     unionG = length(union(G1,G2)),
     Tanimoto = length(union(G1,G2))/length(intersect(G1,G2))
     ),
   by = c('D1','D2')]

输出 -

   D1 D2 intersectG unionG Tanimoto
1:  A  B          3      7 2.333333