来自两个data.frames的小提琴情节

时间:2015-02-17 18:50:24

标签: r plot ggplot2 dataframe

我想用ggplot2制作小提琴情节。 y轴应为出现次数。这可以由N中的df2表示,也可以由ID中每个group的唯一df1映射数表示。 Freq中的x轴应为df1。密度应对应于每组(A,B,C)出现的范围(df2 $ N)。

#dummy data

df1 <- read.table(text=" ID Freq group
ind00001    1   A
ind00001    3   A
ind00001   12   B
ind00001   19   B
ind00001   33   C
ind00001    2   A
ind00003    1   A
ind00003   32   C
ind00003   20   B
ind00003   12   B
ind00003    4   B
ind00003    3   A
ind00003    4   B
ind00006    2   A
ind00006   11   B
ind00006    1   A
ind00006   34   C
ind00006    1   A
ind00006    5   B
ind00013    1   A
ind00013    5   B
ind00013    6   B
ind00013   11   B
ind00013    6   B
ind00013   10   B
ind00013    1   A
ind00015    2   A
ind00015   10   B
ind00015   33   C
ind00015    5   B
ind00022    1   A
ind00022    8   B
ind00022   26   B
ind00022    4   B
ind00048    2   A
ind00048    9   B
ind00048   30   B
ind00048    6   B
ind00068    2   A
ind00068   10   B
ind00084    1   A
ind00084    1   A
ind00084    4   B
ind00084    1   A
ind00084    2   A
ind00089    3   A
ind00089   30   B
ind00104    2   A
ind00104    2   A
ind00104    1   A
ind00104    6   B
ind00104    4   B
ind00104    4   B
ind00106    2   A
ind00106    1   A
ind00106   10   B
ind00106    3   A
ind00106    2   A
ind00118    2   A
ind00118    2   A
ind00118    6   B
ind00118   19   B
ind00118    3   A
ind00118    2   A
ind00123    3   A
ind00123    2   A
ind00123    1   A
ind00123    3   A
ind00123    4   B
ind00123   31   C
ind00130    1   A
ind00130    2   A
ind00130    1   A
ind00130   19   B
ind00130    3   A
ind00130    2   A
ind00138    3   A
ind00138    7   B
ind00138    1   A
ind00138    3   A
ind00138    5   B
ind00138   10   B
ind00138   25   B
ind00148    2   A
ind00148    3   A
ind00148    3   A
ind00148    3   A
ind00148   19   B
ind00149    3   A
ind00149    1   A
ind00149    5   B
ind00156    1   A
ind00156    2   A
ind00156    9   B
ind00156    2   A
ind00169    3   A
ind00169    3   A
ind00169    2   A
ind00169    4   B
ind00169    3   A", header=T)

df2 <- read.table(text="N group ID
3  A ind00001
2  B ind00001
1  C ind00001
1  A ind00002
2  B ind00002
1  C ind00002
2  A ind00003
4  B ind00003
1  C ind00003
3  B ind00004
1  C ind00004
1  B ind00005
1  C ind00005
3  A ind00006
2  B ind00006
1  C ind00006
1  A ind00007
1  B ind00007
1  C ind00007
2  A ind00008
3  B ind00008
1  C ind00008
1  A ind00009
3  B ind00009
1  A ind00010
2  B ind00010
1  C ind00010
1  A ind00011
1  B ind00011
1  C ind00011
1  A ind00012
4  B ind00012
1  C ind00012
2  A ind00013
5  B ind00013
1  A ind00014
2  B ind00014
1  C ind00014
1  A ind00015
2  B ind00015
1  C ind00015
3  B ind00016
1  C ind00016
3  B ind00017
1  C ind00017
2  A ind00018
2  B ind00018
2  B ind00019
1  C ind00019
2  A ind00020
1  B ind00020
1  A ind00021
2  B ind00021
1  C ind00021
1  A ind00022
3  B ind00022
2  A ind00023
3  B ind00023
1  C ind00023
2  B ind00024
1  C ind00024
6  B ind00025
1  C ind00025
1  A ind00026
2  B ind00026
1  C ind00026
1  A ind00027
1  B ind00027
1  C ind00027
1  A ind00028
2  B ind00028
1  C ind00028
1  A ind00029
1  B ind00029
1  C ind00029
1  A ind00030
3  B ind00030
1  C ind00030
6  B ind00031
1  C ind00031
2  A ind00032
1  B ind00032
1  A ind00033
4  B ind00033
3  B ind00034
1  C ind00034
2  A ind00035
1  B ind00035
1  A ind00036
1  B ind00036
1  A ind00037
3  B ind00037
1  C ind00037
1  A ind00038
4  B ind00038
1  A ind00039
3  B ind00039
1  A ind00040
2  B ind00040
2  B ind00041", header=T)

试图将其用于绘图,但它(显然)会产生一个不正确的情节。

require(ggplot2)
require(qpcR)
ggplot(data.frame(qpcR:::cbind.na(x=df1$Freq, y=df2$N, group=df1$group)), aes(x=x, y=y, group=group, fill=group)) + geom_violin() + theme_bw()

enter image description here

正确的图应该具有密度,按组A,B,C,对应于它们的出现次数(df2 $ N)。

E.g。 C组(浅蓝色或图中的3)在y轴上不应超过值1,如下所示。

任何指针都会受到高度赞赏,谢谢!

# C only have df$N == 1
subset(df2, group %in% "C") 

N group ID
1     C ind00001
1     C ind00002
1     C ind00003
1     C ind00004
1     C ind00005
1     C ind00006
1     C ind00007
1     C ind00008
1     C ind00010
1     C ind00011
1     C ind00012
1     C ind00014
1     C ind00015
1     C ind00016
1     C ind00017
1     C ind00019
1     C ind00021
1     C ind00023
1     C ind00024
1     C ind00025
1     C ind00026
1     C ind00027
1     C ind00028
1     C ind00029
1     C ind00030
1     C ind00031
1     C ind00034
1     C ind00037

# B have df$N ranging from 1 to 6
subset(df2, group %in% "B")

N group      ID
2     B ind00001
2     B ind00002
4     B ind00003
3     B ind00004
1     B ind00005
2     B ind00006
1     B ind00007
3     B ind00008
3     B ind00009
2     B ind00010
1     B ind00011
4     B ind00012
5     B ind00013
2     B ind00014
2     B ind00015
3     B ind00016
3     B ind00017
2     B ind00018
2     B ind00019
1     B ind00020
2     B ind00021
3     B ind00022
3     B ind00023
2     B ind00024
6     B ind00025
2     B ind00026
1     B ind00027
2     B ind00028
1     B ind00029
3     B ind00030
6     B ind00031
1     B ind00032
4     B ind00033
3     B ind00034
1     B ind00035
1     B ind00036
3     B ind00037
4     B ind00038
3     B ind00039
2     B ind00040
2     B ind00041

# A only have df$N ranging from 1 to 3
subset(df2, group %in% "A")

N group      ID
3     A ind00001
1     A ind00002
2     A ind00003
3     A ind00006
1     A ind00007
2     A ind00008
1     A ind00009
1     A ind00010
1     A ind00011
1     A ind00012
2     A ind00013
1     A ind00014
1     A ind00015
2     A ind00018
2     A ind00020
1     A ind00021
1     A ind00022
2     A ind00023
1     A ind00026
1     A ind00027
1     A ind00028
1     A ind00029
1     A ind00030
2     A ind00032
1     A ind00033
2     A ind00035
1     A ind00036
1     A ind00037
1     A ind00038
1     A ind00039
1     A ind00040

2 个答案:

答案 0 :(得分:1)

plotData <- merge(df1,df2,by=c("ID","group"),all=F)

group = C中的所有成员都有相同的N导致geom_violin失败 - boxplot是一个选项:

ggplot(plotData, aes(x=as.factor(group), y=N, group=group, fill=group)) + geom_boxplot() + theme_bw()

否则,删除group = C:

ggplot(plotData[plotData$group!="C",], aes(x=as.factor(group), y=N, group=group, fill=group)) + geom_violin() + theme_bw()

答案 1 :(得分:0)

非常感谢您的回复。如果我结合@CMichael的两个答案,我会更接近我想象的输出。

p1 <- ggplot(subset(plotData, group %in% c("A","B")), aes(x=as.factor(group), y=N, group=group, fill=group)) + geom_violin() + theme_bw()

p1 + geom_boxplot(data=subset(plotData, group %in% c("C")), aes(x=as.factor(group), y=N, group=group, fill=group)) 

enter image description here

<强>更新

更接近最初的想法。虽然,小提琴不是以质量密度为中心,而是在它们的x间隔的中心。

ggplot(plotData, aes(x=Freq, y=N)) + theme_bw() + scale_x_continuous(breaks=1:36) + geom_jitter(aes(colour=group)) +geom_violin(data=subset(plotData, group %in% c("A","B")), alpha = .0, trim=F, aes(group=group)) + scale_y_continuous(breaks=1:7)

enter image description here