我有200个小鼠ID的组,每个小鼠有一个基因表达值列表,但每只小鼠有相同基因的多个实例。我希望每个鼠标只列出一次基因,并且该值等于所有先前值的总和。
例如这个数据:
mouse_number value gene
1 64 2.00000 Lypla1
2 65 1.00000 Lypla1
3 64 7.00000 Lypla1
4 65 3.00000 Lypla1
7 64 4.00000 Pck1
8 65 2.00000 Pck1
9 64 1.00000 Pck1
10 65 5.00000 Pck1
应该是:
mouse_number value gene
1 64 9.00000 Lypla1
2 65 4.00000 Lypla1
3 64 5.00000 Pck1
4 65 7.00000 Pck1
请帮助,谢谢!
答案 0 :(得分:0)
您可以使用aggregate
:
df <- data.frame(
mouse_number = c(64, 65, 64, 65, 64, 65, 64, 65),
value = c(2.0, 1.0, 7.0, 3.0, 4.0, 2.0, 1.0, 5.0),
gene = c("Lypla1", "Lypla1", "Lypla1", "Lypla1", "Pck1", "Pck1", "Pck1", "Pck1"));
df.collapsed <- aggregate(value ~ mouse_number + gene, FUN = sum, data = df);
df.collapsed;
# mouse_number gene value
#1 64 Lypla1 9
#2 65 Lypla1 4
#3 64 Pck1 5
#4 65 Pck1 7