在R中的行中排列数据

时间:2017-04-13 10:00:18

标签: r

我有OMIM基因列表(大约15,000个基因),其中有相应的疾病,如下所示:

SLC6A8,CRTR,CCDS1   Cerebral creatine deficiency syndrome 1, 300352 (3)
BCAP31,BAP31,DXS1357E,DDCH  Deafness, dystonia, and cerebral hypomyelination
ABCD1,ALD,AMN   Adrenoleukodystrophy, 300100 (3), X-linked recessive
PLXNB3,PLXN6    NA  

对于某些疾病,我们有一个以上与疾病相关的基因名称。我想组织这个,所以每行只有一个基因名称和相关疾病:

SLC6A8 Cerebral creatine deficiency syndrome 1, 300352 (3)
CRTR Cerebral creatine deficiency syndrome 1, 300352 (3)
CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3)

这可以在R吗?

完成

1 个答案:

答案 0 :(得分:1)

不完全确定您拥有什么样的数据结构。这是一个快速解决方案,希望对您正在寻找的内容有所帮助:

splitFn <- function(x) expand.grid(df[x,"a"] %>% as.character %>% strsplit(., ",") %>% unlist, df[x, "b"])
ldply(1:nrow(df), splitFn)

       Var1                                                Var2
1    SLC6A8  Cerebral creatine deficiency syndrome 1, 300352(3)
2      CRTR  Cerebral creatine deficiency syndrome 1, 300352(3)
3     CCDS1  Cerebral creatine deficiency syndrome 1, 300352(3)
4    BCAP31    Deafness, dystonia, and cerebral hypomyelination
5     BAP31    Deafness, dystonia, and cerebral hypomyelination
6  DXS1357E    Deafness, dystonia, and cerebral hypomyelination
7      DDCH    Deafness, dystonia, and cerebral hypomyelination
8     ABCD1 Adrenoleukodystrophy, 300100(3), X-linked recessive
9       ALD Adrenoleukodystrophy, 300100(3), X-linked recessive
10      AMN Adrenoleukodystrophy, 300100(3), X-linked recessive
11   PLXNB3                                                <NA>
12    PLXN6                                                <NA>

我使用的data.frame

df <- structure(list(a = structure(c(4L, 2L, 1L, 3L), .Label = c("ABCD1,ALD,AMN", 
"BCAP31,BAP31,DXS1357E,DDCH", "PLXNB3,PLXN6", "SLC6A8,CRTR,CCDS1"
), class = "factor"), b = structure(c(1L, 3L, 2L, NA), .Label = c(" Cerebral 
creatine deficiency syndrome 1, 300352(3)", 
"Adrenoleukodystrophy, 300100(3), X-linked recessive", "Deafness, dystonia, and cerebral hypomyelination"
), class = "factor")), .Names = c("a", "b"), row.names = c(NA, 
-4L), class = "data.frame")```