在箱线图中标记一个点

时间:2014-01-10 19:40:27

标签: r ggplot2 visualization

我正在使用ggplot2在3页中绘制三个不同的集合作为三个boxplot。在每一组中都有一点我想强调一下,并说明这一点与其他点的比较,是否在盒子内?或外面。

这是我的数据点

    CDH     1KG     NHLBI
CDH 301     688     1762
RS0 204     560     21742
RS1 158     1169    1406
RS2 182     1945    1467
RS3 256     2371    1631
RS4 198     580     1765
RS5 193     524     1429
RS6 139     2551    1469
RS7 188     702     1584
RS8 142     4311    1461
RS9 223     916     1591
RS10 250    794     1406
RS11 185    539     1270
RS12 228    641     1786
RS13 152    557     1677
RS14 225    1970    1619
RS15 196    458     1543
RS16 203    2891    1528
RS17 221    1542    1780
RS18 258    1173    1850
RS19 202    718     1651
RS20 191    6314    1564


library(ggplot2) 
rm(list = ls())
orig_table = read.table("thedata.csv", header = T, sep = ",")
bb = orig_table # have copy of the data
bb = bb[,-1] # since these points, the ones in the first raw are my interesting point, I exclude them from the sets for the time being
tt = bb
mydata = cbind(c(tt[,1], tt[,2], tt[,3]), c(rep(1,22),rep(2,22),rep(3,22))) # I form the dataframe
data2 = cbind(c(301,688,1762),c(1,2,3)) # here is the points that I want to highlight - similar to the first raw
colnames(data2) = c("num","gro")
data2 = as.data.frame(data2) # I form them as a dataframe 

colnames(mydata) = c("num","gro")
mydata = as.data.frame(mydata)
mydata$gro = factor(mydata$gro, levels=c(1,2,3))
qplot(gro, num, data=mydata, geom=c("boxplot"))+scale_y_log10() # I am making the dataframe out of 21 other ponts
# and here I want to highlight those three values in the "data2" dataframe

感谢您的帮助

1 个答案:

答案 0 :(得分:3)

首先,如果您使用长格式的数据,ggplot会更容易使用。来自melt的{​​{1}}有助于此:

reshape2

现在,我所做的只是在第一行添加一个TRUE,将数据融化为与ggplot兼容,并在箱​​形图之外用highlight == TRUE绘制点。

enter image description here

编辑:这是我制作数据的方式:

library(reshape2)
library(ggplot2)
df$highlight <- c(TRUE, rep(FALSE, nrow(df) - 1L))  # tag first row as interesting
df.2 <- melt(df)  # convert df to long format
ggplot(subset(df.2, !highlight), aes(x=variable, y=value)) + 
  geom_boxplot() + scale_y_log10() +
  geom_point(                               # add the highlight points
    data=subset(df.2, highlight), 
    aes(x=variable, y=value), 
    color="red", size=5
  )