我有一个类似于此的数据集...
Id Disease Gene Mutation Expression 101 Disease_X Gene_A R273G Normal 101 Disease_X GENE_B G12D Normal 102 Disease_Y GENE_C L858R High
我希望重塑它,使得即使没有值存在,每个id和基因对都表示为突变和表达。
例如,每个id-gene对将有6个可能的值(3个基因用于Mutation,3个用于Expression),如果原始表中没有Mutation或Expression的值,输出将提供一些标准输出以供缺失填充该行的数据(例如" No Mutation data")。表输出如下所示:
Id Disease Type Gene Value 101 Disease_X Mutation Gene A R273G 101 Disease_X Mutation GENE B G12D 101 Disease_X Mutation GENE C No Mutation Data 101 Disease_X Expression Gene A Normal 101 Disease_X Expression GENE B Normal 101 Disease_X Expression GENE C No Expression Data 102 Disease_Y Mutation Gene A No Mutation Data 102 Disease_Y Mutation GENE B No Mutation Data 102 Disease_Y Mutation GENE C L858R 102 Disease_Y Expression Gene A No Expression Value 102 Disease_Y Expression GENE B No Expression Value 102 Disease_Y Expression GENE C High
我知道有一种简单的方法可以做到这一点(使用合并或融化?)但我还没有想出任何简单明了的事情。
答案 0 :(得分:3)
您需要执行一些额外步骤才能获得您正在寻找的内容。
在下文中,我首先制作" Id"," Type"和" Gene"的所有组合,将其与&#合并34;长"数据集的形式,然后修复"疾病"列。
我已将NA
作为NA
离开,因为如果您需要继续工作,这对我来说似乎更有意义。
这假设您从名为" mydf"的数据集开始。
library(data.table)
library(reshape2)
DT <- as.data.table(mydf) ## Convert to data.table
DTL <- melt(DT, id.vars = c("Id", "Disease", "Gene")) ## Make it long
groups <- c("Id", "Gene", "variable") ## Save some typing
toMerge <- do.call(CJ, lapply(DTL[, groups, ## Generate the combos
with = FALSE], unique))
merged <- merge(DTL, toMerge, by = groups, all = TRUE) ## merge
merged[, Disease := unique(na.omit(Disease)), by = Id][] ## Fill in Disease
# Id Gene variable Disease value
# 1: 101 GENE_B Mutation Disease_X G12D
# 2: 101 GENE_B Expression Disease_X Normal
# 3: 101 GENE_C Mutation Disease_X NA
# 4: 101 GENE_C Expression Disease_X NA
# 5: 101 Gene_A Mutation Disease_X R273G
# 6: 101 Gene_A Expression Disease_X Normal
# 7: 102 GENE_B Mutation Disease_Y NA
# 8: 102 GENE_B Expression Disease_Y NA
# 9: 102 GENE_C Mutation Disease_Y L858R
# 10: 102 GENE_C Expression Disease_Y High
# 11: 102 Gene_A Mutation Disease_Y NA
# 12: 102 Gene_A Expression Disease_Y NA