我有以下数据框:
genus_sub <- structure(list(GutREF001.1_MDA_1 = c(0, 1, 0, 0, 0, 0, 0, 0,
0, 0), GutREF001.1_MDA_2 = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0), GutREF001.1_MDA_3 = c(0,
1, 0, 0, 0, 0, 0, 0, 0, 0), GutREF001.2_MDA_1 = c(0, 1, 0, 0,
0, 0, 0, 0, 0, 0), GutREF001.2_MDA_2 = c(0, 1, 0, 0, 0, 0, 0,
0, 0, 0), GutREF001.2_MDA_3 = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0),
ID = c("Enterococcaceae (B; Firm)", "Oscillospiraceae (B; Firm)",
"Enterobacteriaceae (B; Prot)", "Helicobacteraceae (B; Prot)",
"Peptoniphilaceae (B; Firm)", "Flavobacteriaceae (B; Bact)",
"Methanobacteriaceae (A; Eury)", "Coriobacteriaceae (B; Acti)",
"Micrococcaceae (B; Acti)", "Lactobacillaceae (B; Firm)")), .Names = c("GutREF001.1_MDA_1",
"GutREF001.1_MDA_2", "GutREF001.1_MDA_3", "GutREF001.2_MDA_1",
"GutREF001.2_MDA_2", "GutREF001.2_MDA_3", "ID"), row.names = c("Enterococcaceae (B; Firm)",
"Oscillospiraceae (B; Firm)", "Enterobacteriaceae (B; Prot)",
"Helicobacteraceae (B; Prot)", "Peptoniphilaceae (B; Firm)",
"Flavobacteriaceae (B; Bact)", "Methanobacteriaceae (A; Eury)",
"Coriobacteriaceae (B; Acti)", "Micrococcaceae (B; Acti)", "Lactobacillaceae (B; Firm)"
), class = "data.frame")
由MDA_1,MDA_2和MDA_3分隔的相同列名称一式三份(技术重复样本)分析需要一次在三个这样的相同样本之间进行分析
我想计算:
我。共识 - 即对于每一行,确定50%样本中存在的ID(值== 1)或在这种情况下至少有三分之二
II。 Sample_consensus_detected - 从上面确定的共识集中,找到一式三份的单个样本中存在的ID数
III。 Sample_consensus_not_detected - 从上面确定的共识集中,找到一式三份的单个样本中不存在的ID数
IV。 Replicate_not_in_consensus - 存在于个别样本中但未达成共识
IV。 summary_metric_1 - (ii /(ii + iii))
诉summary_metric_2 =(iv /(ii + iv))
我编写了以下代码来开始总结三个组:
row.names(genus_sub) <- genus_table$ID
genus_sub$ID <- NULL
genus_sub %>%
gather(key, value) %>%
extract(key, c("sample_id", "rep"), "([[:alnum:]]+)_MDA_([[:alnum:]]+)") %>%
group_by(sample_id) %>%
summarize(sample_sum = sum(value))
答案 0 :(得分:0)
您可以通过融合这样的数据来计算共识(请注意,这需要数据 之前删除ID列):
melted <- melt(genus_sub,id="ID")
melted$variable <- substr(melted$variable,1,nchar(as.character(melted$variable))-2)
melted %>%
group_by(ID,variable) %>%
summarize(value = sum(value)) %>%
dcast(ID ~ variable, sum)
子字符串函数会删除列名称中的计数器(现在是融合数据表中variable
的值),以便您可以按variable
进行分组。如果您的示例中有超过9个样本可以达成共识,则可以使用更精细的gsub
替换它。
输出在每列中给出ID =和= = 1的总和(因此,为了得到二进制共识,您希望将2或3转换为1,否则为0。
ID GutREF001.1_MDA GutREF001.2_MDA
1 Coriobacteriaceae (B; Acti) 0 0
2 Enterobacteriaceae (B; Prot) 0 0
3 Enterococcaceae (B; Firm) 0 0
4 Flavobacteriaceae (B; Bact) 0 0
5 Helicobacteraceae (B; Prot) 0 0
6 Lactobacillaceae (B; Firm) 0 0
7 Methanobacteriaceae (A; Eury) 0 0
8 Micrococcaceae (B; Acti) 0 0
9 Oscillospiraceae (B; Firm) 3 3
10 Peptoniphilaceae (B; Firm) 0 0