#在将直方图,普通曲线和qqplots写入.pdf时,会导致stats和ggplot2之间出现不同的行为吗?

时间:2016-09-01 15:34:43

标签: r pdf plot ggplot2 histogram

我需要制作用于统计分析的图表,我对stats和ggplot之间的行为差​​异感到困惑。谁能提供帮助?
我试图生成一个带有直方图的pdf,包括正常曲线,与qqplots并排,下一个绘图继续在同一页面上。最好使用ggplot(因为更漂亮的地块)。我的真实数据集中有大量变量,所以我使用'for'循环。

library(ggplot2)  
library(stats)  
library(datasets) 

这段ggplot代码完成了我想要它做的事情。

ggplot(airquality, aes(Wind)) + 
  geom_histogram(aes(y = ..density..),colour = "black", fill = "white") + 
  stat_function(fun = dnorm, args = list(mean = mean(airquality$Wind), sd = sd(airquality$Wind)), colour = "red", size = 1) + 
  xlab("Wind")
qplot(sample = airquality$Wind, stat = "qq")

我对binwidth警告很好,我希望自动选择,我将在稍后构建对该消息的抑制。我不确定要做什么但是:'“stat”被弃用'任何人?
如果我尝试将其设置为'for'循环,我无法使其工作。它不断地将每个绘图放在一个新的页面上,它省去了正常的曲线:

Variablesairquality<-c("Wind", "Temp", "Month", "Day") 
pdf(file = "Normality.pdf", 4, 5)
par(mfrow = c(2,2))
for(i in Variablesairquality){
  plot(ggplot(airquality, aes(airquality[,i])) + 
         geom_histogram(aes(y = ..density..),colour = "black", fill = "white") + 
         stat_function(fun = dnorm, args = list(mean = mean(airquality[,i]), sd = sd(airquality[,i])), colour = "red", size = 1) + 
         xlab(i)
      )
  plot(qplot(sample = airquality[,i], stat = "qq" )
  )
}
dev.off()

我没有得到,因为如果我使用统计数据来尝试它,它就完全符合我的要求:

pdf(file = "Normality2.pdf", 4, 5)
par(mfrow = c(2,2))
for(i in Variablesairquality){
  h <- hist(airquality[,i], col = "white", cex.axis=0.50, xlab = i, cex.lab=0.75, main = paste("Distribution"), cex.main= 0.75) 
  xfit<-seq(min(airquality[,i]),max(airquality[,i]),length=length(airquality[,i])) 
  yfit<-dnorm(xfit,mean=mean(airquality[,i]),sd=sd(airquality[,i])) 
  yfit <- yfit*diff(h$mids[1:2])*length(airquality[,i]) 
  lines(xfit, yfit, col="red", lwd=1)
  qqnorm(airquality[,i], cex = 0.5, cex.axis=0.50, cex.lab=0.75, main = expression("Q-Q plot for"~paste(i)), cex.main= 0.75)
  qqline(airquality[,i], col = "red")
}
dev.off()

(接受带有主要标签的东西,我还需要弄清楚。有人提示吗?)
如果有人能指出我的ggplot代码中的错误或以其他方式解释这种行为,我将非常感激。谢谢!
我使用R-programming V3.2.3和R-studio v0.99.891。 (是的,我在这里阅读了每一个相似的项目,浏览了互联网并阅读了帮助文件;这并没有让我得到我需要去的地方。)

1 个答案:

答案 0 :(得分:1)

`stat` is deprecated上,请参阅 ggplot2 2.0.0 发行说明中的​​Deprecated features。改为使用:

ggplot(airquality, aes(sample = Wind)) +
  stat_qq()

如果您不想使用gridExtra::grid.arrange,这是一种使用方面的方法。首先将数据与我们想要的x,y,绘图类型和地理变量所需的值争论到一个新的数据框中:

d <- as.data.frame(qqnorm(airquality$Wind, plot.it = F))
d$plot <- "QQ plot"
d$geom <- "point"
d <- rbind(d, data.frame(x = airquality$Wind, y = NA, 
                         plot = "Histogram", geom = "bar"))
d <- rbind(d, with(airquality, data.frame(
                x = seq(min(Wind), max(Wind), l = 100), 
                y = dnorm(seq(min(Wind), max(Wind), l = 100), 
                          mean = mean(Wind), sd = sd(Wind)),
                plot = "Histogram", geom = "line")))

然后调用ggplot,根据每个geom对数据进行子集化:

ggplot(d, aes(x = x, y = y)) + facet_wrap(~plot, scales = "free") +
  geom_histogram(data = subset(d, plot == "Histogram" & geom == "bar"),
                 aes(y = ..density..), 
                 colour = "black", fill = "white") +
  geom_line(data = subset(d, plot == "Histogram" & geom == "line"),
            colour = "red", size = 1) +
  geom_point(data = subset(d, plot == "QQ plot")) +
  labs(x = "Wind")

输出:

enter image description here

要执行多个绘图,您可以将上面的代码包装到for循环中,确保将ggplot包裹在print内:

pdf("path/to/pdf/out.pdf")
Variablesairquality <- c("Wind", "Temp", "Month", "Day") 
for (i in rev(Variablesairquality)) { 
  x <- airquality[[i]]
  d <- as.data.frame(qqnorm(x, plot.it = F)) 
  d$plot <- "QQ plot" 
  d$geom <- "point" 
  d <- rbind(d, data.frame(x = x, y = NA, plot = "Histogram", geom = "bar")) 
  d <- rbind(d, data.frame(x = seq(min(x), max(x), l = 100), 
                           y = dnorm(seq(min(x), max(x), l = 100), 
                                     mean = mean(x), sd = sd(x)),
                           plot = "Histogram", geom = "line"))

  print(
    ggplot(d, aes(x = x, y = y)) + facet_wrap(~plot, scales = "free") +
      geom_histogram(data = subset(d, plot == "Histogram" & geom == "bar"),
                     aes(y = ..density..), 
                     colour = "black", fill = "white") +
      geom_line(data = subset(d, plot == "Histogram" & geom == "line"),
                colour = "red", size = 1) +
      geom_point(data = subset(d, plot == "QQ plot")) +
      labs(x = i)
  )
} 
dev.off()