我在R中有一个数据表,看起来像:
Gene Population Color Coverage
A_1 PopA Blue 0.016
A_1 PopA Green 0.022
A_1 PopB Blue 0.1322
A_1 PopB Green 0.552
A_2 PopA Blue 0.13
A_2 PopA Green 0.14
A_2 PopB Blue 1
A_2 PopB Green 0.9
我想了解不同颜色(蓝色和绿色)之间的差异,但仅适用于相同的基因和种群。最终,我想输出一个看起来像这样的表:
Gene Population Coverage
A_1 PopA -0.006
A_1 PopB -0.4198
A_2 PopA -0.01
A_2 PopB 0.1
我一直在使用Rmisc中的summarySE()函数来获取我在上面指示的平均值,但是不清楚如何计算类似值之间的差值。
谢谢!
答案 0 :(得分:1)
一个带有dplyr
的选项是
library(dplyr)
my.df %>%
group_by(Gene, Population) %>%
summarize(Coverage = Coverage[Color == "Blue"] - Coverage[Color == "Green"])
# A tibble: 4 x 3
# Groups: Gene [?]
# Gene Population Coverage
# <fct> <fct> <dbl>
# 1 A_1 PopA -0.00600
# 2 A_1 PopB -0.420
# 3 A_2 PopA -0.01
# 4 A_2 PopB 0.100
数据
my.df <-
structure(list(Gene = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("A_1", "A_2"), class = "factor"),
Population = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("PopA", "PopB"), class = "factor"),
Color = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Blue", "Green"), class = "factor"),
Coverage = c(0.016, 0.022, 0.1322, 0.552, 0.13, 0.14, 1, 0.9)), class = "data.frame", row.names = c(NA, -8L))