I have some data that are missing some observations, e.g.,
library(dplyr)
library(ggplot2)
dframe <- data.frame(height = c(1, 2, NA, 4,
1.2, 2.5, 3.8, 4.4,
3, NA, 5, 7),
name = rep(c("A", "B", "C"), each = 4),
date = rep(c(1, 2, 3, 4), 3))
So data look like this:
height name date
1 1.0 A 1
2 2.0 A 2
3 NA A 3
4 4.0 A 4
But, in my data, NA values don't actually exist, so dframe is actually:
dframe <- dframe %>%
filter(!height %in% NA)
I'd like to create a plot for the data where I show the raw data for each "name" - A, B, and C - and also have a "mean height" line. I try using:
ggplot(dframe, aes(date, height)) +
geom_point() +
geom_line(aes(group = name), color = "blue") +
stat_summary(fun.y="mean", geom="line", size = 1) +
theme_bw()
But, as you can see, because of the missing values, ggplot's "mean" line appears jagged and misleading.
答案 0 :(得分:2)
您可以插入缺失的值,然后绘制:
library(tidyverse)
# Starting data frame
dframe = dframe %>% filter(!is.na(height))
dframe %>%
complete(date, nesting(name)) %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(heightImp = approx(height, xout=date)$y,
imputed.flag = ifelse(is.na(height), "Imputed", "Measured")) %>%
ggplot(aes(date, heightImp)) +
geom_line(aes(group = name), color = "blue") +
geom_point(aes(colour=imputed.flag)) +
stat_summary(fun.y="mean", geom="line", size = 1) +
scale_colour_manual(values=c("red","blue")) +
labs(colour="") +
theme_bw()
您还可以绘制回归线,即每个x值的条件均值,受制于回归线实际上是直线的约束,而不是连接平均值时得到的分段线性结果在每个x值单独计算:
ggplot(dframe, aes(date, height)) +
geom_line(aes(group = name), color = "blue") +
geom_point() +
geom_smooth(method="lm", colour="black", se=FALSE) +
theme_bw()
您还可以使用更复杂的回归函数。下面的代码显示了三阶多项式和具有三个自由度的B样条。在这种情况下它们是相同的(三阶多项式的黑色曲线是&#34;&#34; B-样条曲线的红色曲线),由于时间点的数量很少,但是一般来说是不同的。关键在于您可以使用线性回归来适应各种函数,具体取决于您认为适合您的数据和主题的内容。 (在这种情况下,另一个因素是您为每个主题重复测量,因此适当的模型将通过使用分层模型(请参阅lme4
或nlme
包))来考虑这一点:
ggplot(dframe, aes(date, height)) +
geom_line(aes(group = name), color = "blue") +
geom_point() +
geom_smooth(method="lm", formula=y ~ poly(x, 3), colour="black", se=FALSE) +
geom_smooth(method="lm", formula=y ~ splines::bs(x,df=3), colour="red", se=FALSE) +
theme_bw()