我想用ggplot2制作小提琴情节。 y轴应为出现次数。这可以由N
中的df2
表示,也可以由ID
中每个group
的唯一df1
映射数表示。 Freq
中的x轴应为df1
。密度应对应于每组(A,B,C)出现的范围(df2 $ N)。
#dummy data
df1 <- read.table(text=" ID Freq group
ind00001 1 A
ind00001 3 A
ind00001 12 B
ind00001 19 B
ind00001 33 C
ind00001 2 A
ind00003 1 A
ind00003 32 C
ind00003 20 B
ind00003 12 B
ind00003 4 B
ind00003 3 A
ind00003 4 B
ind00006 2 A
ind00006 11 B
ind00006 1 A
ind00006 34 C
ind00006 1 A
ind00006 5 B
ind00013 1 A
ind00013 5 B
ind00013 6 B
ind00013 11 B
ind00013 6 B
ind00013 10 B
ind00013 1 A
ind00015 2 A
ind00015 10 B
ind00015 33 C
ind00015 5 B
ind00022 1 A
ind00022 8 B
ind00022 26 B
ind00022 4 B
ind00048 2 A
ind00048 9 B
ind00048 30 B
ind00048 6 B
ind00068 2 A
ind00068 10 B
ind00084 1 A
ind00084 1 A
ind00084 4 B
ind00084 1 A
ind00084 2 A
ind00089 3 A
ind00089 30 B
ind00104 2 A
ind00104 2 A
ind00104 1 A
ind00104 6 B
ind00104 4 B
ind00104 4 B
ind00106 2 A
ind00106 1 A
ind00106 10 B
ind00106 3 A
ind00106 2 A
ind00118 2 A
ind00118 2 A
ind00118 6 B
ind00118 19 B
ind00118 3 A
ind00118 2 A
ind00123 3 A
ind00123 2 A
ind00123 1 A
ind00123 3 A
ind00123 4 B
ind00123 31 C
ind00130 1 A
ind00130 2 A
ind00130 1 A
ind00130 19 B
ind00130 3 A
ind00130 2 A
ind00138 3 A
ind00138 7 B
ind00138 1 A
ind00138 3 A
ind00138 5 B
ind00138 10 B
ind00138 25 B
ind00148 2 A
ind00148 3 A
ind00148 3 A
ind00148 3 A
ind00148 19 B
ind00149 3 A
ind00149 1 A
ind00149 5 B
ind00156 1 A
ind00156 2 A
ind00156 9 B
ind00156 2 A
ind00169 3 A
ind00169 3 A
ind00169 2 A
ind00169 4 B
ind00169 3 A", header=T)
df2 <- read.table(text="N group ID
3 A ind00001
2 B ind00001
1 C ind00001
1 A ind00002
2 B ind00002
1 C ind00002
2 A ind00003
4 B ind00003
1 C ind00003
3 B ind00004
1 C ind00004
1 B ind00005
1 C ind00005
3 A ind00006
2 B ind00006
1 C ind00006
1 A ind00007
1 B ind00007
1 C ind00007
2 A ind00008
3 B ind00008
1 C ind00008
1 A ind00009
3 B ind00009
1 A ind00010
2 B ind00010
1 C ind00010
1 A ind00011
1 B ind00011
1 C ind00011
1 A ind00012
4 B ind00012
1 C ind00012
2 A ind00013
5 B ind00013
1 A ind00014
2 B ind00014
1 C ind00014
1 A ind00015
2 B ind00015
1 C ind00015
3 B ind00016
1 C ind00016
3 B ind00017
1 C ind00017
2 A ind00018
2 B ind00018
2 B ind00019
1 C ind00019
2 A ind00020
1 B ind00020
1 A ind00021
2 B ind00021
1 C ind00021
1 A ind00022
3 B ind00022
2 A ind00023
3 B ind00023
1 C ind00023
2 B ind00024
1 C ind00024
6 B ind00025
1 C ind00025
1 A ind00026
2 B ind00026
1 C ind00026
1 A ind00027
1 B ind00027
1 C ind00027
1 A ind00028
2 B ind00028
1 C ind00028
1 A ind00029
1 B ind00029
1 C ind00029
1 A ind00030
3 B ind00030
1 C ind00030
6 B ind00031
1 C ind00031
2 A ind00032
1 B ind00032
1 A ind00033
4 B ind00033
3 B ind00034
1 C ind00034
2 A ind00035
1 B ind00035
1 A ind00036
1 B ind00036
1 A ind00037
3 B ind00037
1 C ind00037
1 A ind00038
4 B ind00038
1 A ind00039
3 B ind00039
1 A ind00040
2 B ind00040
2 B ind00041", header=T)
试图将其用于绘图,但它(显然)会产生一个不正确的情节。
require(ggplot2)
require(qpcR)
ggplot(data.frame(qpcR:::cbind.na(x=df1$Freq, y=df2$N, group=df1$group)), aes(x=x, y=y, group=group, fill=group)) + geom_violin() + theme_bw()
正确的图应该具有密度,按组A,B,C,对应于它们的出现次数(df2 $ N)。
E.g。 C组(浅蓝色或图中的3)在y轴上不应超过值1,如下所示。
任何指针都会受到高度赞赏,谢谢!
# C only have df$N == 1
subset(df2, group %in% "C")
N group ID
1 C ind00001
1 C ind00002
1 C ind00003
1 C ind00004
1 C ind00005
1 C ind00006
1 C ind00007
1 C ind00008
1 C ind00010
1 C ind00011
1 C ind00012
1 C ind00014
1 C ind00015
1 C ind00016
1 C ind00017
1 C ind00019
1 C ind00021
1 C ind00023
1 C ind00024
1 C ind00025
1 C ind00026
1 C ind00027
1 C ind00028
1 C ind00029
1 C ind00030
1 C ind00031
1 C ind00034
1 C ind00037
# B have df$N ranging from 1 to 6
subset(df2, group %in% "B")
N group ID
2 B ind00001
2 B ind00002
4 B ind00003
3 B ind00004
1 B ind00005
2 B ind00006
1 B ind00007
3 B ind00008
3 B ind00009
2 B ind00010
1 B ind00011
4 B ind00012
5 B ind00013
2 B ind00014
2 B ind00015
3 B ind00016
3 B ind00017
2 B ind00018
2 B ind00019
1 B ind00020
2 B ind00021
3 B ind00022
3 B ind00023
2 B ind00024
6 B ind00025
2 B ind00026
1 B ind00027
2 B ind00028
1 B ind00029
3 B ind00030
6 B ind00031
1 B ind00032
4 B ind00033
3 B ind00034
1 B ind00035
1 B ind00036
3 B ind00037
4 B ind00038
3 B ind00039
2 B ind00040
2 B ind00041
# A only have df$N ranging from 1 to 3
subset(df2, group %in% "A")
N group ID
3 A ind00001
1 A ind00002
2 A ind00003
3 A ind00006
1 A ind00007
2 A ind00008
1 A ind00009
1 A ind00010
1 A ind00011
1 A ind00012
2 A ind00013
1 A ind00014
1 A ind00015
2 A ind00018
2 A ind00020
1 A ind00021
1 A ind00022
2 A ind00023
1 A ind00026
1 A ind00027
1 A ind00028
1 A ind00029
1 A ind00030
2 A ind00032
1 A ind00033
2 A ind00035
1 A ind00036
1 A ind00037
1 A ind00038
1 A ind00039
1 A ind00040
答案 0 :(得分:1)
plotData <- merge(df1,df2,by=c("ID","group"),all=F)
group = C中的所有成员都有相同的N导致geom_violin失败 - boxplot是一个选项:
ggplot(plotData, aes(x=as.factor(group), y=N, group=group, fill=group)) + geom_boxplot() + theme_bw()
否则,删除group = C:
ggplot(plotData[plotData$group!="C",], aes(x=as.factor(group), y=N, group=group, fill=group)) + geom_violin() + theme_bw()
答案 1 :(得分:0)
非常感谢您的回复。如果我结合@CMichael的两个答案,我会更接近我想象的输出。
p1 <- ggplot(subset(plotData, group %in% c("A","B")), aes(x=as.factor(group), y=N, group=group, fill=group)) + geom_violin() + theme_bw()
p1 + geom_boxplot(data=subset(plotData, group %in% c("C")), aes(x=as.factor(group), y=N, group=group, fill=group))
<强>更新强>
更接近最初的想法。虽然,小提琴不是以质量密度为中心,而是在它们的x间隔的中心。
ggplot(plotData, aes(x=Freq, y=N)) + theme_bw() + scale_x_continuous(breaks=1:36) + geom_jitter(aes(colour=group)) +geom_violin(data=subset(plotData, group %in% c("A","B")), alpha = .0, trim=F, aes(group=group)) + scale_y_continuous(breaks=1:7)