例如：

Question

我是编程新手，刚开始学习R，因此请求我无知。我目前正在使用如下所示的数据：

我有以下格式的数据。

例如：

疾病 基因符号
疾病A FOXJ1
疾病B MYB
疾病B GATA4
疾病C MYB
疾病D GATA4

有大约250个这样的条目。我希望看到以下格式的数据：

疾病1 常见共有基因符号 疾病2

疾病A MYB，FOXJ1 疾病B

疾病C MYB 疾病B

疾病B GATA4 疾病D

我接近这个的方式：我把这个过程分成3个步骤：

步骤1：制作疾病的成对组合。

步骤2：找到与每种疾病相关的基因符号，并将它们分配给载体。

步骤3：现在在这些创建的向量上使用相交（％n％）函数来查找共享基因符号。

我确信必须有比这简单得多的东西。

任何帮助将不胜感激！非常感谢你！

此致小号

Answer 1

使用combinat包的解决方案是：

library(combinat)

#random data
DF <- data.frame(Disease = LETTERS[1:10], Gene = sample(letters[1:4], 10, T))

#> DF
#   Disease Gene
#1        A    a
#2        B    a
#3        C    c
#4        D    b
#5        E    d
#6        F    b
#7        G    c
#8        H    d
#9        I    b
#10       J    d

#all possible combinations of diseases
dis_combns <- combn(DF$Disease, 2)  #see `?combn`

#find common genes between each pair of diseases
commons <- apply(dis_combns, 2, 
       function(x) union(DF$Gene[DF$Disease == x[1]], DF$Gene[DF$Disease == x[2]])) 
#format the list of common genes for easier manipulation later
commons <- unlist(lapply(commons, paste, collapse = " and "))

#result
resultDF <- data.frame(Disease1 = dis_combns[1,], 
                     Common_genes = commons, Disease2 = dis_combns[2,])

#> resultDF
#   Disease1 Common_genes Disease2
#1         A            a        B
#2         A      a and c        C
#3         A      a and b        D
#4         A      a and d        E
#5         A      a and b        F
#6         A      a and c        G
#7         A      a and d        H
#8         A      a and b        I
#9         A      a and d        J
#10        B      a and c        C
#11        B      a and b        D
#12        B      a and d        E
#13        B      a and b        F
#14        B      a and c        G
#....

在数据对中查找共同的值

例如：

1 个答案: