我有这个数据集,我想以ID.name
为行的方式重新制作。 Canonical_Hugo_Symbol
是列名,Canonical_Protein_Change
是单元格的值。如果没有NA
但其他单元格只有0,那就太好了。
mydata.df <- data.frame(ID.name = c("1000", "1000", "1000", "1001","1001","1001","1002","1002" ), Canonical_Protein_Change = c("p.Y1467H", "p.R1466W", "p.*427Q", "p.V320fs","p.S5383fs","p.D519V","p.S51A", "p.K183_splice" ), Canonical_Hugo_Symbol = c("gene1", "gene3", "gene1", "gene1","gene3","gene4","gene1", "gene2" ))
我融化了它:
ff.melt <- melt(mydata.df, id.var = c("ID.name", "Canonical_Hugo_Symbol"))
ff.melt
ID.name Canonical_Hugo_Symbol variable value
1 1000 gene1 Canonical_Protein_Change p.Y1467H
2 1000 gene3 Canonical_Protein_Change p.R1466W
3 1000 gene1 Canonical_Protein_Change p.*427Q
4 1001 gene1 Canonical_Protein_Change p.V320fs
5 1001 gene3 Canonical_Protein_Change p.S5383fs
6 1001 gene4 Canonical_Protein_Change p.D519V
7 1002 gene1 Canonical_Protein_Change p.S51A
8 1002 gene2 Canonical_Protein_Change p.K183_splice
然后我重铸了它:
ff.cast <- dcast(ff.melt, ID.name ~ Canonical_Hugo_Symbol + value)
我得到了df
:
ff.cast
ID.name gene1_p.*427Q gene1_p.S51A gene1_p.V320fs gene1_p.Y1467H gene2_p.K183_splice gene3_p.R1466W gene3_p.S5383fs
1 1000 p.*427Q <NA> <NA> p.Y1467H <NA> p.R1466W <NA>
2 1001 <NA> <NA> p.V320fs <NA> <NA> <NA> p.S5383fs
3 1002 <NA> p.S51A <NA> <NA> p.K183_splice <NA> <NA>
gene4_p.D519V
1 <NA>
2 p.D519V
3 <NA>
它接近我想要的但现在每个“基因”有许多不同名称的列。例如我想将gene1_p.*427Q
,gene1_p.S51A
,gene1_p.V320fs
,gene1_p.Y1467H
全部放在一列中。
我也用过:
dcast(mydata.df, ID.name ~ Canonical_Hugo_Symbol, value_var = "Canonical_Protein_Change" )
但我收到此错误消息:
Error in .fun(.value[0], ...) : 2 arguments passed to 'length' which requires 1 >
由于
我想要这张桌子或类似的东西!谢谢!
ID.name gene1 gene2 gene3 gene4
1 1000 Cp.*427Q 0 p.R1466W 0
2 1001 p.V320fs 0 p.S5383fs p.D519V
3 1002 p.S51A p.K183 0 0
当我尝试时我越来越近但是这些名字是错的:
reshape(mydata.df, direction = 'wide', idvar = 'ID.name', timevar = 'Canonical_Hugo_Symbol')
我修复了这些名字:
colnames(mydata.reshape) <- sub("Canonical_Protein_Change.(.*?)","\\1", colnames(mydata.reshape))
但NA还在那里
答案 0 :(得分:2)
你可以试试这个:
# concatenate values in cells with more than one value
dcast(mydata.df, ID.name ~ Canonical_Hugo_Symbol, value.var = "Canonical_Protein_Change",
fun.aggregate = function(x) paste(x, collapse = "; "), fill = "0")
# ID.name gene1 gene2 gene3 gene4
# 1 1000 p.Y1467H; p.*427Q 0 p.R1466W 0
# 2 1001 p.V320fs 0 p.S5383fs p.D519V
# 3 1002 p.S51A p.K183_splice 0 0
# ...or pick the first value in cells with more than one value
dcast(mydata.df, ID.name ~ Canonical_Hugo_Symbol, value.var = "Canonical_Protein_Change",
fun.aggregate = head, 1, fill = "0")
# ID.name gene1 gene2 gene3 gene4
# 1 1000 p.Y1467H 0 p.R1466W 0
# 2 1001 p.V320fs 0 p.S5383fs p.D519V
# 3 1002 p.S51A p.K183_splice 0 0