我有OMIM基因列表(大约15,000个基因),其中有相应的疾病,如下所示:
SLC6A8,CRTR,CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3)
BCAP31,BAP31,DXS1357E,DDCH Deafness, dystonia, and cerebral hypomyelination
ABCD1,ALD,AMN Adrenoleukodystrophy, 300100 (3), X-linked recessive
PLXNB3,PLXN6 NA
对于某些疾病,我们有一个以上与疾病相关的基因名称。我想组织这个,所以每行只有一个基因名称和相关疾病:
SLC6A8 Cerebral creatine deficiency syndrome 1, 300352 (3)
CRTR Cerebral creatine deficiency syndrome 1, 300352 (3)
CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3)
这可以在R吗?
完成答案 0 :(得分:1)
不完全确定您拥有什么样的数据结构。这是一个快速解决方案,希望对您正在寻找的内容有所帮助:
splitFn <- function(x) expand.grid(df[x,"a"] %>% as.character %>% strsplit(., ",") %>% unlist, df[x, "b"])
ldply(1:nrow(df), splitFn)
Var1 Var2
1 SLC6A8 Cerebral creatine deficiency syndrome 1, 300352(3)
2 CRTR Cerebral creatine deficiency syndrome 1, 300352(3)
3 CCDS1 Cerebral creatine deficiency syndrome 1, 300352(3)
4 BCAP31 Deafness, dystonia, and cerebral hypomyelination
5 BAP31 Deafness, dystonia, and cerebral hypomyelination
6 DXS1357E Deafness, dystonia, and cerebral hypomyelination
7 DDCH Deafness, dystonia, and cerebral hypomyelination
8 ABCD1 Adrenoleukodystrophy, 300100(3), X-linked recessive
9 ALD Adrenoleukodystrophy, 300100(3), X-linked recessive
10 AMN Adrenoleukodystrophy, 300100(3), X-linked recessive
11 PLXNB3 <NA>
12 PLXN6 <NA>
我使用的data.frame
df <- structure(list(a = structure(c(4L, 2L, 1L, 3L), .Label = c("ABCD1,ALD,AMN",
"BCAP31,BAP31,DXS1357E,DDCH", "PLXNB3,PLXN6", "SLC6A8,CRTR,CCDS1"
), class = "factor"), b = structure(c(1L, 3L, 2L, NA), .Label = c(" Cerebral
creatine deficiency syndrome 1, 300352(3)",
"Adrenoleukodystrophy, 300100(3), X-linked recessive", "Deafness, dystonia, and cerebral hypomyelination"
), class = "factor")), .Names = c("a", "b"), row.names = c(NA,
-4L), class = "data.frame")```