我有一个数据框,其中包含2016年1月11日至1月14日的连续缺失值
library(lubridate)
set.seed(123)
timestamp1 <- seq(as.POSIXct("2016-01-01"),as.POSIXct("2016-01-10 23:59:59"), by = "hour")
timestamp2 <- seq(as.POSIXct("2016-01-15"),as.POSIXct("2016-01-20 23:59:59"), by = "hour")
data_obj <- data.frame(value = c (rnorm(length(timestamp1),150,5),rnorm(length(timestamp2),110,3)),timestamp = c(timestamp1,timestamp2))
data_obj$day <- lubridate::date(data_obj$timestamp)
data_obj$hour <- lubridate::hour(data_obj$timestamp)
当我使用
绘制热图时ggplot(data_obj,aes(day,hour,fill=value)) + geom_tile()
我得到如下图所示的热图;红色标记的矩形区域对应于缺失值
我应该如何完全隐藏这个空白区域并制作连续的热图?
请注意,我不想更改x轴日期的格式,我也不想用其他颜色显示缺失值。
答案 0 :(得分:2)
对@ Jacob保留日期标签格式和顺序的不同答案:
library(lubridate)
set.seed(123)
timestamp1 <- seq(as.POSIXct("2016-01-01"),as.POSIXct("2016-01-10 23:59:59"), by = "hour")
timestamp2 <- seq(as.POSIXct("2016-01-15"),as.POSIXct("2016-01-20 23:59:59"), by = "hour")
data_obj <- data.frame(value = c (rnorm(length(timestamp1),150,5),
rnorm(length(timestamp2),110,3)),
timestamp = c(timestamp1,timestamp2))
data_obj$day <- lubridate::date(data_obj$timestamp)
data_obj$hour <- lubridate::hour(data_obj$timestamp)
# preserve the date order manally in a factor
data_obj$day_f <- format(data_obj$day, "%b %d")
dplyr::arrange(data_obj, day) %>%
dplyr::distinct(day_f) -> day_f_order
data_obj$day_f <- factor(data_obj$day_f, levels=day_f_order$day_f)
ggplot(data_obj, aes(day_f, hour, fill=value)) +
geom_tile() +
scale_x_discrete(expand=c(0,0), breaks=c("Jan 04", "Jan 18")) +
scale_y_continuous(expand=c(0,0)) +
viridis::scale_fill_viridis(name=NULL) +
coord_equal() +
labs(x=NULL, y=NULL) +
theme(panel.background=element_blank()) +
theme(panel.grid=element_blank()) +
theme(axis.ticks=element_blank()) +
theme(legend.position="bottom")
注意:如果没有明确,非常明显的说明可以解释数据缺失,那么您仍然会将数据误传给您的受众。
答案 1 :(得分:1)
如果您将某一天更改为一个因素,则会忽略该差距:
ggplot(data_obj, aes(factor(day),hour,fill=value)) + geom_tile()
根据真实情况的不同,您可能会对x轴的外观感到满意,也可能不满意。