我还在学习如何使用tidyr。我想使用" gather()"将列分成多行,并保留" gene_ID"通过复制适当的列。 输入数据示例:
let req = initReq
{ method = "POST"
, requestBody = RequestBodyBS "body itself as a ByteString"
}
所需输出数据的示例:
gene_ID path1 path2 path3 path4 path5 path6 path7 path8
CAMNT_0043146643 RNA transport
CAMNT_0029561721 Ribosome
CAMNT_0024703307 Sphingolipid signaling pathway Lysosome
CAMNT_0020981363 mRNA surveillance pathway Hippo signaling pathway cAMP signaling pathway cGMP - PKG signaling pathway Regulation of actin cytoskeleton Meiosis - yeast Oocyte meiosis Focal adhesion
CAMNT_0020021387 Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway Endocytosis
CAMNT_0003293445 Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway Endocytosis
目前,我正在尝试:
gene_ID Pathway
CAMNT_0043146643 RNA transport
CAMNT_0029561721 Ribosome
CAMNT_0024703307 Lysosome
CAMNT_0024703307 Sphingolipid signaling pathway
CAMNT_0020981363 mRNA surveillance pathway
CAMNT_0020981363 Hippo signaling pathway
CAMNT_0020981363 cAMP signaling pathway
CAMNT_0020981363 cGMP - PKG signaling pathway
CAMNT_0020981363 Regulation of actin cytoskeleton
CAMNT_0020981363 Meiosis - yeast
CAMNT_0020981363 Oocyte meiosis
CAMNT_0020981363 Focal adhesion
CAMNT_0020021387 Spliceosome
CAMNT_0020021387 Protein processing in endoplasmic reticulum
CAMNT_0020021387 MAPK signaling pathway
CAMNT_0020021387 Endocytosis
CAMNT_0003293445 Spliceosome
CAMNT_0003293445 Protein processing in endoplasmic reticulum
CAMNT_0003293445 MAPK signaling pathway
CAMNT_0003293445 Endocytosis
但是我收到一条错误消息:"错误:列规范无效" 我已经尝试使用和不使用标头输入df,但是会出现同样的错误。我愿意采用其他方法,但我对“NAs"因为不是所有的行" gene_IDs"列数相同。
有关如何进行的建议?
答案 0 :(得分:2)
以下是tidyr
解决方案:
df %>%
gather(path, Pathway, path1, path2) %>%
filter(Pathway != "") %>%
select(-path)
x Pathway
1 a test1
2 b test1
3 c test2
4 d test2
5 e test3
6 a testa
7 c testg
8 d testd
答案 1 :(得分:1)
df <- data.frame(x = c("a", "b", "c","d","e"),
path1=c("test1","test1","test2","test2","test3"),
path2=c("testa","","testg","testd",""))
library(reshape2)
df[df==""] <- NA
melt(df, id.vars="x", na.rm=T)
# x variable value
# 1 a path1 test1
# 2 b path1 test1
# 3 c path1 test2
# 4 d path1 test2
# 5 e path1 test3
# 6 a path2 testa
# 8 c path2 testg
# 9 d path2 testd