我有一个数据框p4p5
,其中包含以下列:
p4p5 <- c("SampleID", "expr", "Gene", "Period", "Consequence", "isPTV")
我在这里使用了aggregate
函数来找到每个基因的中值表达:
p4p5_med <- aggregate(expr ~ Gene, p4p5, median)
但是,这将导致数据框仅包含“ expr”和“ Gene”列。应用聚合函数时如何仍然保留所有原始列?
更新:
输入(p4p5
):
SampleID expr Gene Period Consequence isPTV
HSB430 -1.23 ENSG000098 4 upstream_gene_variant 0
HSB321 -0.02 ENSG000098 5 stop_gained 1
HSB296 3.12 ENSG000027 4 upstream_gene_variant 0
HSB201 1.22 ENSG000027 4 intron_variant 0
HSB220 0.13 ENSG000013 6 intron_variant 0
预期输出:
SampleID expr Gene Period Consequence isPTV Median
HSB430 -1.23 ENSG000098 4 upstream_gene_variant 0 -0.625
HSB321 -0.02 ENSG000098 5 stop_gained 1 -0.625
HSB296 3.12 ENSG000027 4 upstream_gene_variant 0 2.17
HSB201 1.22 ENSG000027 4 intron_variant 0 2.17
HSB220 0.13 ENSG000013 6 intron_variant 0 0.13
答案 0 :(得分:1)
为此,我将使用dplyr
library(dplyr)
p4p5 %>%
group_by(Gene) %>%
mutate(Median = median(expr, na.rm = TRUE)) %>%
ungroup()
SampleID expr Gene Period Consequence isPTV Median
<chr> <dbl> <chr> <int> <chr> <int> <dbl>
1 HSB430 -1.23 ENSG000098 4 upstream_gene_variant 0 -0.625
2 HSB321 -0.02 ENSG000098 5 stop_gained 1 -0.625
3 HSB296 3.12 ENSG000027 4 upstream_gene_variant 0 2.17
4 HSB201 1.22 ENSG000027 4 intron_variant 0 2.17
5 HSB220 0.13 ENSG000013 6 intron_variant 0 0.13