ggplot中多个组的密度图

时间:2018-06-05 12:46:14

标签: r ggplot2 plotly density-plot

关于如何制作密度图,我看过example1How to overlay density plots in R?以及Overlapped density plots in ggplot2。我可以使用第二个链接中的代码创建密度图。但是我想知道如何在ggplotplotly中制作这样的图表? 我看了所有的例子,但无法弄清楚我的问题。 我有一个基因表达leukemia data description的玩具数据框,其中的列指的是2组个体

leukemia_big <- read.csv("http://web.stanford.edu/~hastie/CASI_files/DATA/leukemia_big.csv")

df <- data.frame(class= ifelse(grepl("^ALL", colnames(leukemia_big),
                 fixed = FALSE), "ALL", "AML"), row.names = colnames(leukemia_big))

plot(density(as.matrix(leukemia_big[,df$class=="ALL"])), 
     lwd=2, col="red")
lines(density(as.matrix(leukemia_big[,df$class=="AML"])), 
      lwd=2, col="darkgreen")

1 个答案:

答案 0 :(得分:4)

Ggplot需要整洁的格式化数据,也称为长格式化数据帧。 以下示例将执行此操作。但要小心,所提供的数据集具有几乎相同的患者类型值分布,因此当您绘制ALL和AML类型的患者时,曲线重叠,您无法看到差异。

library(tidyverse)

leukemia_big %>% 
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
ggplot(aes(x = value, fill = type)) + geom_density(alpha = 0.5)

results with original data

在第二个例子中,我将为所有AML类型的患者的值变量添加1个单位,以直观地展示重叠问题

leukemia_big %>% 
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
mutate(value2 = if_else(condition = type == "ALL", true = value, false = value + 1)) %>% # Helps demonstrate the overlapping between both type of patients
ggplot(aes(x = value2, fill = type)) + geom_density(alpha = 0.5)`

results with modified data for AML type patients