Question

我有OMIM基因列表（大约15,000个基因），其中有相应的疾病，如下所示：

SLC6A8,CRTR,CCDS1   Cerebral creatine deficiency syndrome 1, 300352 (3)
BCAP31,BAP31,DXS1357E,DDCH  Deafness, dystonia, and cerebral hypomyelination
ABCD1,ALD,AMN   Adrenoleukodystrophy, 300100 (3), X-linked recessive
PLXNB3,PLXN6    NA

对于某些疾病，我们有一个以上与疾病相关的基因名称。我想组织这个，所以每行只有一个基因名称和相关疾病：

SLC6A8 Cerebral creatine deficiency syndrome 1, 300352 (3)
CRTR Cerebral creatine deficiency syndrome 1, 300352 (3)
CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3)

这可以在R吗？

完成

Answer 1

不完全确定您拥有什么样的数据结构。这是一个快速解决方案，希望对您正在寻找的内容有所帮助：

splitFn <- function(x) expand.grid(df[x,"a"] %>% as.character %>% strsplit(., ",") %>% unlist, df[x, "b"])
ldply(1:nrow(df), splitFn)

       Var1                                                Var2
1    SLC6A8  Cerebral creatine deficiency syndrome 1, 300352(3)
2      CRTR  Cerebral creatine deficiency syndrome 1, 300352(3)
3     CCDS1  Cerebral creatine deficiency syndrome 1, 300352(3)
4    BCAP31    Deafness, dystonia, and cerebral hypomyelination
5     BAP31    Deafness, dystonia, and cerebral hypomyelination
6  DXS1357E    Deafness, dystonia, and cerebral hypomyelination
7      DDCH    Deafness, dystonia, and cerebral hypomyelination
8     ABCD1 Adrenoleukodystrophy, 300100(3), X-linked recessive
9       ALD Adrenoleukodystrophy, 300100(3), X-linked recessive
10      AMN Adrenoleukodystrophy, 300100(3), X-linked recessive
11   PLXNB3                                                <NA>
12    PLXN6                                                <NA>

我使用的data.frame

df <- structure(list(a = structure(c(4L, 2L, 1L, 3L), .Label = c("ABCD1,ALD,AMN", 
"BCAP31,BAP31,DXS1357E,DDCH", "PLXNB3,PLXN6", "SLC6A8,CRTR,CCDS1"
), class = "factor"), b = structure(c(1L, 3L, 2L, NA), .Label = c(" Cerebral 
creatine deficiency syndrome 1, 300352(3)", 
"Adrenoleukodystrophy, 300100(3), X-linked recessive", "Deafness, dystonia, and cerebral hypomyelination"
), class = "factor")), .Names = c("a", "b"), row.names = c(NA, 
-4L), class = "data.frame")```

在R中的行中排列数据

1 个答案: