使用ggplot2在热图中隐藏/删除缺失值

时间:2016-11-29 11:21:04

标签: r ggplot2 heatmap

我有一个数据框,其中包含2016年1月11日至1月14日的连续缺失值

 library(lubridate)
    set.seed(123)
    timestamp1 <- seq(as.POSIXct("2016-01-01"),as.POSIXct("2016-01-10 23:59:59"), by = "hour")
    timestamp2 <- seq(as.POSIXct("2016-01-15"),as.POSIXct("2016-01-20 23:59:59"), by = "hour")
    data_obj <- data.frame(value = c (rnorm(length(timestamp1),150,5),rnorm(length(timestamp2),110,3)),timestamp = c(timestamp1,timestamp2))  
   data_obj$day <- lubridate::date(data_obj$timestamp)
   data_obj$hour <- lubridate::hour(data_obj$timestamp)

当我使用

绘制热图时
ggplot(data_obj,aes(day,hour,fill=value)) + geom_tile()

我得到如下图所示的热图;红色标记的矩形区域对应于缺失值

enter image description here

我应该如何完全隐藏这个空白区域并制作连续的热图?

请注意,我不想更改x轴日期的格式,我也不想用其他颜色显示缺失值

2 个答案:

答案 0 :(得分:2)

对@ Jacob保留日期标签格式和顺序的不同答案:

library(lubridate)

set.seed(123)

timestamp1 <- seq(as.POSIXct("2016-01-01"),as.POSIXct("2016-01-10 23:59:59"), by = "hour")
timestamp2 <- seq(as.POSIXct("2016-01-15"),as.POSIXct("2016-01-20 23:59:59"), by = "hour")

data_obj <- data.frame(value = c (rnorm(length(timestamp1),150,5),
                                  rnorm(length(timestamp2),110,3)),
                       timestamp = c(timestamp1,timestamp2))  
data_obj$day <- lubridate::date(data_obj$timestamp)
data_obj$hour <- lubridate::hour(data_obj$timestamp)

# preserve the date order manally in a factor

data_obj$day_f <- format(data_obj$day, "%b %d")

dplyr::arrange(data_obj, day) %>% 
  dplyr::distinct(day_f) -> day_f_order

data_obj$day_f <- factor(data_obj$day_f, levels=day_f_order$day_f)

ggplot(data_obj, aes(day_f, hour, fill=value)) + 
  geom_tile() +
  scale_x_discrete(expand=c(0,0), breaks=c("Jan 04", "Jan 18")) +
  scale_y_continuous(expand=c(0,0)) +
  viridis::scale_fill_viridis(name=NULL) +
  coord_equal() +
  labs(x=NULL, y=NULL) +
  theme(panel.background=element_blank()) +
  theme(panel.grid=element_blank()) +
  theme(axis.ticks=element_blank()) +
  theme(legend.position="bottom")

enter image description here

注意:如果没有明确,非常明显的说明可以解释数据缺失,那么您仍然会将数据误传给您的受众。

答案 1 :(得分:1)

如果您将某一天更改为一个因素,则会忽略该差距:

ggplot(data_obj, aes(factor(day),hour,fill=value)) + geom_tile()

根据真实情况的不同,您可能会对x轴的外观感到满意,也可能不满意。