我需要制作用于统计分析的图表,我对stats和ggplot之间的行为差异感到困惑。谁能提供帮助?
我试图生成一个带有直方图的pdf,包括正常曲线,与qqplots并排,下一个绘图继续在同一页面上。最好使用ggplot(因为更漂亮的地块)。我的真实数据集中有大量变量,所以我使用'for'循环。
library(ggplot2)
library(stats)
library(datasets)
这段ggplot代码完成了我想要它做的事情。
ggplot(airquality, aes(Wind)) +
geom_histogram(aes(y = ..density..),colour = "black", fill = "white") +
stat_function(fun = dnorm, args = list(mean = mean(airquality$Wind), sd = sd(airquality$Wind)), colour = "red", size = 1) +
xlab("Wind")
qplot(sample = airquality$Wind, stat = "qq")
我对binwidth警告很好,我希望自动选择,我将在稍后构建对该消息的抑制。我不确定要做什么但是:'“stat”被弃用'任何人?
如果我尝试将其设置为'for'循环,我无法使其工作。它不断地将每个绘图放在一个新的页面上,它省去了正常的曲线:
Variablesairquality<-c("Wind", "Temp", "Month", "Day")
pdf(file = "Normality.pdf", 4, 5)
par(mfrow = c(2,2))
for(i in Variablesairquality){
plot(ggplot(airquality, aes(airquality[,i])) +
geom_histogram(aes(y = ..density..),colour = "black", fill = "white") +
stat_function(fun = dnorm, args = list(mean = mean(airquality[,i]), sd = sd(airquality[,i])), colour = "red", size = 1) +
xlab(i)
)
plot(qplot(sample = airquality[,i], stat = "qq" )
)
}
dev.off()
我没有得到,因为如果我使用统计数据来尝试它,它就完全符合我的要求:
pdf(file = "Normality2.pdf", 4, 5)
par(mfrow = c(2,2))
for(i in Variablesairquality){
h <- hist(airquality[,i], col = "white", cex.axis=0.50, xlab = i, cex.lab=0.75, main = paste("Distribution"), cex.main= 0.75)
xfit<-seq(min(airquality[,i]),max(airquality[,i]),length=length(airquality[,i]))
yfit<-dnorm(xfit,mean=mean(airquality[,i]),sd=sd(airquality[,i]))
yfit <- yfit*diff(h$mids[1:2])*length(airquality[,i])
lines(xfit, yfit, col="red", lwd=1)
qqnorm(airquality[,i], cex = 0.5, cex.axis=0.50, cex.lab=0.75, main = expression("Q-Q plot for"~paste(i)), cex.main= 0.75)
qqline(airquality[,i], col = "red")
}
dev.off()
(接受带有主要标签的东西,我还需要弄清楚。有人提示吗?)
如果有人能指出我的ggplot代码中的错误或以其他方式解释这种行为,我将非常感激。谢谢!
我使用R-programming V3.2.3和R-studio v0.99.891。 (是的,我在这里阅读了每一个相似的项目,浏览了互联网并阅读了帮助文件;这并没有让我得到我需要去的地方。)
答案 0 :(得分:1)
在`stat` is deprecated
上,请参阅 ggplot2 2.0.0 发行说明中的Deprecated features。改为使用:
ggplot(airquality, aes(sample = Wind)) +
stat_qq()
如果您不想使用gridExtra::grid.arrange
,这是一种使用方面的方法。首先将数据与我们想要的x,y,绘图类型和地理变量所需的值争论到一个新的数据框中:
d <- as.data.frame(qqnorm(airquality$Wind, plot.it = F))
d$plot <- "QQ plot"
d$geom <- "point"
d <- rbind(d, data.frame(x = airquality$Wind, y = NA,
plot = "Histogram", geom = "bar"))
d <- rbind(d, with(airquality, data.frame(
x = seq(min(Wind), max(Wind), l = 100),
y = dnorm(seq(min(Wind), max(Wind), l = 100),
mean = mean(Wind), sd = sd(Wind)),
plot = "Histogram", geom = "line")))
然后调用ggplot
,根据每个geom对数据进行子集化:
ggplot(d, aes(x = x, y = y)) + facet_wrap(~plot, scales = "free") +
geom_histogram(data = subset(d, plot == "Histogram" & geom == "bar"),
aes(y = ..density..),
colour = "black", fill = "white") +
geom_line(data = subset(d, plot == "Histogram" & geom == "line"),
colour = "red", size = 1) +
geom_point(data = subset(d, plot == "QQ plot")) +
labs(x = "Wind")
输出:
要执行多个绘图,您可以将上面的代码包装到for循环中,确保将ggplot
包裹在print
内:
pdf("path/to/pdf/out.pdf")
Variablesairquality <- c("Wind", "Temp", "Month", "Day")
for (i in rev(Variablesairquality)) {
x <- airquality[[i]]
d <- as.data.frame(qqnorm(x, plot.it = F))
d$plot <- "QQ plot"
d$geom <- "point"
d <- rbind(d, data.frame(x = x, y = NA, plot = "Histogram", geom = "bar"))
d <- rbind(d, data.frame(x = seq(min(x), max(x), l = 100),
y = dnorm(seq(min(x), max(x), l = 100),
mean = mean(x), sd = sd(x)),
plot = "Histogram", geom = "line"))
print(
ggplot(d, aes(x = x, y = y)) + facet_wrap(~plot, scales = "free") +
geom_histogram(data = subset(d, plot == "Histogram" & geom == "bar"),
aes(y = ..density..),
colour = "black", fill = "white") +
geom_line(data = subset(d, plot == "Histogram" & geom == "line"),
colour = "red", size = 1) +
geom_point(data = subset(d, plot == "QQ plot")) +
labs(x = i)
)
}
dev.off()