带有ggplot2 :: stat_qq的Q-Q图,颜色,带有Q-Q线的多个组

时间:2017-05-02 15:56:21

标签: r plot ggplot2

我需要做一些类似于这个优秀问题所示的内容:

Q-Q plot with ggplot2::stat_qq, colours, single group

但不幸的是,有一点点差异阻碍了我。与原始问题不同,我确实希望按组分开分位数计算,但我还想为每个组添加一个QQ行。按照OP的代码,我可以按组创建分位数 - 分位数图:

library(dplyr)
library(ggplot2)
library(broom) ## for augment()

set.seed(1001)
N <- 1000
G <- 10
dd <- data_frame(x = runif(N),
                 group = factor(sample(LETTERS[1:G], size=N, replace=TRUE)),
                 y = rnorm(N) + 2*x + as.numeric(group))
m1 <- lm(y~x, data=dd)
dda <- cbind(augment(m1), group=dd$group)
sample_var <- "y"
group_var  <- "group"
p <- ggplot(dda)+stat_qq(aes_string(sample=sample_var, colour=group_var))
p

enter image description here

如何为每组添加分位数 - 分位数线? 注意:理想情况下,我想在运行时指定示例列和组列。这就是我使用aes_string的原因。

编辑为了更好地澄清我的问题,我添加代码来计算分位数 - 分位数行,当只有一个组时。我需要将代码概括为多个组。

library(dplyr)
library(ggplot2)
library(broom) ## for augment()

# this section of the code is the same as before, EXCEPT G = 1, because for 
# now the code only works for 1 group
set.seed(1001)
N <- 1000
G <- 1
dd <- data_frame(x = runif(N),
                 group = factor(sample(LETTERS[1:G], size=N, replace=TRUE)),
                 y = rnorm(N) + 2*x + as.numeric(group))
m1 <- lm(y~x, data=dd)
dda <- cbind(augment(m1), group=dd$group)
sample_var <- "y"
group_var  <- "group"

# code to compute the slope and the intercept of the qq-line: basically,
# I would need to compute the slopes and the intercepts of the qq-lines
# for each group
vec <- dda[, sample_var]
y <- quantile(vec[!is.na(vec)], c(0.25, 0.75))
x <- qnorm(c(0.25, 0.75))
slope <- diff(y)/diff(x)
int <- y[1] - slope * x[1]

# now plot with ggplot2
p <- ggplot(dda)+stat_qq(aes_string(sample=sample_var, colour=group_var))+geom_abline(slope = slope, intercept = int)
p

enter image description here

2 个答案:

答案 0 :(得分:3)

转换代码以将qqlines计算为函数,然后使用lapply为qqlines创建单独的data.frame是一种方法。

library(dplyr)
library(ggplot2)
library(broom) ## for augment()

set.seed(1001)
N <- 1000
G <- 3
dd <- data_frame(x = runif(N),
                 group = factor(sample(LETTERS[1:G], size=N, replace=TRUE)),
                 y = rnorm(N) + 2*x + as.numeric(group))
m1 <- lm(y~x, data=dd)
dda <- cbind(augment(m1), group=dd$group)
sample_var <- "y"
group_var  <- "group"

# code to compute the slope and the intercept of the qq-line 

qqlines <- function(vec, group) {
    x <- qnorm(c(0.25, 0.75))    
    y <- quantile(vec[!is.na(vec)], c(0.25, 0.75))
    slope <- diff(y)/diff(x)
    int <- y[1] - slope * x[1]
    data.frame(slope, int, group)
}


slopedf <- do.call(rbind,lapply(unique(dda$group), function(grp) qqlines(dda[dda$group == grp,sample_var], grp)))



# now plot with ggplot2
p <- ggplot(dda)+stat_qq(aes_string(sample=sample_var, colour=group_var)) + 
    geom_abline(data = slopedf, aes(slope = slope, intercept = int, colour = group))
p

enter image description here

答案 1 :(得分:1)

更简洁的选择。在ggplot2 v.3.0.0及更高版本中,您可以使用stat_qq_line

ggplot(dda, aes(sample = y, colour = group)) +
  stat_qq() +
  stat_qq_line()

输出:

enter image description here

数据,来自Jeremy Voisey的答案:

library(ggplot2)
library(broom)
set.seed(1001)
N <- 1000
G <- 3
dd <- data_frame(
  x = runif(N),
  group = factor(sample(LETTERS[1:G], size = N, replace = TRUE)),
  y = rnorm(N) + 2 * x + as.numeric(group)
)
m1 <- lm(y ~ x, data = dd)
dda <- cbind(augment(m1), group = dd$group)