R中的tidyr包,使用gather()"无效的列规范"

时间:2015-12-15 18:29:36

标签: r tidyr

我还在学习如何使用tidyr。我想使用" gather()"将列分成多行,并保留" gene_ID"通过复制适当的列。 输入数据示例:

let req = initReq
            { method = "POST"
            , requestBody = RequestBodyBS "body itself as a ByteString"
            }

所需输出数据的示例:

    gene_ID path1   path2   path3   path4   path5   path6   path7   path8
CAMNT_0043146643    RNA transport                           
CAMNT_0029561721    Ribosome                            
CAMNT_0024703307    Sphingolipid signaling pathway  Lysosome                        
CAMNT_0020981363    mRNA surveillance pathway   Hippo signaling pathway cAMP signaling pathway  cGMP - PKG signaling pathway    Regulation of actin cytoskeleton    Meiosis - yeast Oocyte meiosis  Focal adhesion
CAMNT_0020021387    Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway  Endocytosis             
CAMNT_0003293445    Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway  Endocytosis             

目前,我正在尝试:

gene_ID Pathway
CAMNT_0043146643    RNA transport
CAMNT_0029561721    Ribosome
CAMNT_0024703307    Lysosome
CAMNT_0024703307    Sphingolipid signaling pathway
CAMNT_0020981363    mRNA surveillance pathway
CAMNT_0020981363    Hippo signaling pathway
CAMNT_0020981363    cAMP signaling pathway
CAMNT_0020981363    cGMP - PKG signaling pathway
CAMNT_0020981363    Regulation of actin cytoskeleton
CAMNT_0020981363    Meiosis - yeast
CAMNT_0020981363    Oocyte meiosis
CAMNT_0020981363    Focal adhesion
CAMNT_0020021387    Spliceosome
CAMNT_0020021387    Protein processing in endoplasmic reticulum
CAMNT_0020021387    MAPK signaling pathway
CAMNT_0020021387    Endocytosis
CAMNT_0003293445    Spliceosome
CAMNT_0003293445    Protein processing in endoplasmic reticulum
CAMNT_0003293445    MAPK signaling pathway
CAMNT_0003293445    Endocytosis

但是我收到一条错误消息:"错误:列规范无效" 我已经尝试使用和不使用标头输入df,但是会出现同样的错误。我愿意采用其他方法,但我对“NAs"因为不是所有的行" gene_IDs"列数相同。

有关如何进行的建议?

2 个答案:

答案 0 :(得分:2)

以下是tidyr解决方案:

df %>%
  gather(path, Pathway, path1, path2) %>%
  filter(Pathway != "") %>%
  select(-path)

  x Pathway
1 a   test1
2 b   test1
3 c   test2
4 d   test2
5 e   test3
6 a   testa
7 c   testg
8 d   testd

答案 1 :(得分:1)

df <- data.frame(x = c("a", "b", "c","d","e"),
                 path1=c("test1","test1","test2","test2","test3"),
                 path2=c("testa","","testg","testd",""))
library(reshape2)
df[df==""] <- NA
melt(df, id.vars="x", na.rm=T)
#   x variable value
# 1 a    path1 test1
# 2 b    path1 test1
# 3 c    path1 test2
# 4 d    path1 test2
# 5 e    path1 test3
# 6 a    path2 testa
# 8 c    path2 testg
# 9 d    path2 testd