ggplot2:阴影95%CI,x轴上有类别

时间:2016-04-07 12:35:16

标签: r ggplot2 categorical-data confidence-interval

我有以下情节:

PLot1

使用以下代码创建:

ggplot(df, aes(Island, AR, group = Locus, colour = (factor(Type)))) + geom_line(aes(colour = factor(Type), alpha = factor(Type), size = factor(Type))) + scale_alpha_manual(values = c("MS"=0.2, "MT"=0.2, "TLR" = 1), guide = "none") + scale_size_manual(values = c("MS"=0.5, "MT"=0.5, "TLR" = 0.3),guide = "none") + xlab("Island") + ylab("Allelic Richness") + scale_x_discrete(labels = c("Santiago", "Fogo", "Sao Nicolau"), limits = c("ST", "FG", "SN")) + geom_point(aes(shape = (factor(Shapetype)))) + scale_shape_manual(values = c(1,2,3,4,5,6,7,8,9,10), breaks=c("TLR1LA","TLR1LB","TLR2A","TLR2B","TLR3","TLR4","TLR5","TLR21", "MS", "MT")) + scale_colour_manual(values = c("Red","Blue","Black"), breaks=c("TLR1LA","TLR1LB","TLR2A","TLR2B","TLR3","TLR4","TLR5","TLR21", "MS", "MT")) + theme_bw() + labs(shape="Functional", colour="Neutral")

从这些数据:

Locus;Island;AR;Type;Shapetype
MS1;ST;4,6315;MS;NA
MS1;FG;3,9689;MS;NA
MS1;SN;3;MS;NA
MS2;ST;2;MS;NA
MS2;FG;2;MS;NA
MS2;SN;2;MS;NA
MS3;ST;7,5199;MS;NA
MS3;FG;5,5868;MS;NA
MS3;SN;3;MS;NA
MS4;ST;2,9947;MS;NA
MS4;FG;3;MS;NA
MS4;SN;2;MS;NA
MS5;ST;9,0726;MS;NA
MS5;FG;5,6759;MS;NA
MS5;SN;2,963;MS;NA
MS6;ST;6,5779;MS;NA
MS6;FG;5,6842;MS;NA
MS6;SN;2;MS;NA
MS7;ST;2;MS;NA
MS7;FG;1;MS;NA
MS7;SN;1;MS;NA
MS8;ST;3,97;MS;NA
MS8;FG;2,9032;MS;NA
MS8;SN;1;MS;NA
MS9;ST;2;MS;NA
MS9;FG;1,9977;MS;NA
MS9;SN;2;MS;NA
MS10;ST;3,9733;MS;NA
MS10;FG;3,9971;MS;NA
MS10;SN;2;MS;NA
MS11;ST;7,4172;MS;NA
MS11;FG;5,6471;MS;NA
MS11;SN;3;MS;NA
MS12;ST;2;MS;NA
MS12;FG;2;MS;NA
MS12;SN;2;MS;NA
MS13;ST;5,6135;MS;NA
MS13;FG;3;MS;NA
MS13;SN;2;MS;NA
MT;ST;12;MT;NA
MT;FG;3;MT;NA
MT;SN;2;MT;NA
TLR1LA;ST;3,68;TLR;TLR1LA
TLR1LA;FG;4,4;TLR;TLR1LA
TLR1LA;SN;1;TLR;TLR1LA
TLR1LB;ST;3,99;TLR;TLR1LB
TLR1LB;FG;5;TLR;TLR1LB
TLR1LB;SN;1;TLR;TLR1LB
TLR2A;ST;4,9;TLR;TLR2A
TLR2A;FG;5;TLR;TLR2A
TLR2A;SN;2;TLR;TLR2A
TLR2B;ST;5,64;TLR;TLR2B
TLR2B;FG;4;TLR;TLR2B
TLR2B;SN;3;TLR;TLR2B
TLR3;ST;1;TLR;TLR3
TLR3;FG;3;TLR;TLR3
TLR3;SN;3;TLR;TLR3
TLR4;ST;1;TLR;TLR4
TLR4;FG;2,89;TLR;TLR4
TLR4;SN;2;TLR;TLR4
TLR5;ST;2,9;TLR;TLR5
TLR5;FG;2;TLR;TLR5
TLR5;SN;2;TLR;TLR5
TLR21;ST;2,91;TLR;TLR21
TLR21;FG;1;TLR;TLR21
TLR21;SN;1;TLR;TLR21

该图是完美的,除了我想将MS数据(以及MS数据)表示为平均值+ - 95%置信区间,其中CI在x轴上的类别中着色。

但是,我认为我遇到的问题是x轴是分类的,所以我无法进行阴影区域。 x轴上的类别也表示数值差异,这意味着它们实际上具有方向,因此对于我的数据,按照我的描述进行绘图是有意义的,我找不到解决方法。

如果我将类别(ST,FG,SN)更改为数字(1,2,3):

Locus;Island;AR;Type;Shapetype
MS1;1;4,6315;MS;NA
MS1;2;3,9689;MS;NA
MS1;3;3;MS;NA
MS2;1;2;MS;NA
MS2;2;2;MS;NA
MS2;3;2;MS;NA
MS3;1;7,5199;MS;NA
MS3;2;5,5868;MS;NA
MS3;3;3;MS;NA
MS4;1;2,9947;MS;NA
MS4;2;3;MS;NA
MS4;3;2;MS;NA
MS5;1;9,0726;MS;NA
MS5;2;5,6759;MS;NA
MS5;3;2,963;MS;NA
MS6;1;6,5779;MS;NA
MS6;2;5,6842;MS;NA
MS6;3;2;MS;NA
MS7;1;2;MS;NA
MS7;2;1;MS;NA
MS7;3;1;MS;NA
MS8;1;3,97;MS;NA
MS8;2;2,9032;MS;NA
MS8;3;1;MS;NA
MS9;1;2;MS;NA
MS9;2;1,9977;MS;NA
MS9;3;2;MS;NA
MS10;1;3,9733;MS;NA
MS10;2;3,9971;MS;NA
MS10;3;2;MS;NA
MS11;1;7,4172;MS;NA
MS11;2;5,6471;MS;NA
MS11;3;3;MS;NA
MS12;1;2;MS;NA
MS12;2;2;MS;NA
MS12;3;2;MS;NA
MS13;1;5,6135;MS;NA
MS13;2;3;MS;NA
MS13;3;2;MS;NA
MT;1;12;MT;NA
MT;2;3;MT;NA
MT;3;2;MT;NA
TLR1LA;1;3,68;TLR;TLR1LA
TLR1LA;2;4,4;TLR;TLR1LA
TLR1LA;3;1;TLR;TLR1LA
TLR1LB;1;3,99;TLR;TLR1LB
TLR1LB;2;5;TLR;TLR1LB
TLR1LB;3;1;TLR;TLR1LB
TLR2A;1;4,9;TLR;TLR2A
TLR2A;2;5;TLR;TLR2A
TLR2A;3;2;TLR;TLR2A
TLR2B;1;5,64;TLR;TLR2B
TLR2B;2;4;TLR;TLR2B
TLR2B;3;3;TLR;TLR2B
TLR3;1;1;TLR;TLR3
TLR3;2;3;TLR;TLR3
TLR3;3;3;TLR;TLR3
TLR4;1;1;TLR;TLR4
TLR4;2;2,89;TLR;TLR4
TLR4;3;2;TLR;TLR4
TLR5;1;2,9;TLR;TLR5
TLR5;2;2;TLR;TLR5
TLR5;3;2;TLR;TLR5
TLR21;1;2,91;TLR;TLR21
TLR21;2;1;TLR;TLR21
TLR21;3;1;TLR;TLR21

我能够得到一些非常接近的东西:

plot2

使用此代码:

ggplot(df, aes(x=Island, y=AR))+ 
stat_summary(geom="ribbon", fun.data=mean_cl_normal, 
             fun.args=list(conf.int=0.95), fill="red", alpha = .1)+
stat_summary(geom="point", fun.y=mean, color="red") + theme_bw() + scale_x_discrete(labels = c("Santiago", "Fogo", "Sao Nicolau"), limits = c("1", "2", "3"))

但是我无法弄清楚如何将它与原始情节真正结合起来 - 只有MS数据应该像这样呈现 - 其余部分应该与原始情节完全相同。

Stat_summary function:

## Summarizes data.
## Gives count, mean, standard deviation, standard error of the mean, and  confidence interval (default 95%).
##   data: a data frame.
##   measurevar: the name of a column that contains the variable to be summariezed
##   groupvars: a vector containing names of columns that contain grouping variables
##   na.rm: a boolean that indicates whether to ignore NA's
##   conf.interval: the percent range of the confidence interval (default is 95%)
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                  conf.interval=.95, .drop=TRUE) {
    library(plyr)

    # New version of length which can handle NA's: if na.rm==T, don't count them
    length2 <- function (x, na.rm=FALSE) {
        if (na.rm) sum(!is.na(x))
        else       length(x)
    }

    # This does the summary. For each group's data frame, return a vector with
    # N, mean, and sd
    datac <- ddply(data, groupvars, .drop=.drop,
      .fun = function(xx, col) {
        c(N    = length2(xx[[col]], na.rm=na.rm),
          mean = mean   (xx[[col]], na.rm=na.rm),
          sd   = sd     (xx[[col]], na.rm=na.rm)
        )
      },
      measurevar
    )

    # Rename the "mean" column    
    datac <- rename(datac, c("mean" = measurevar))

    datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean

    # Confidence interval multiplier for standard error
    # Calculate t-statistic for confidence interval: 
    # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
    ciMult <- qt(conf.interval/2 + .5, datac$N-1)
    datac$ci <- datac$se * ciMult

    return(datac)
}

0 个答案:

没有答案