时间段覆盖率热图数据重塑

时间:2019-06-14 22:29:35

标签: r dplyr heatmap

我正在尝试使用一个非常奇怪的数据结构创建一个热图

您可以使用以下代码生成一些样本数据(效率很低):

times<-sort(format(seq.POSIXt(as.POSIXct(Sys.Date()),as.POSIXct(Sys.Date()+1),by = "5 min"),"%H%M"))
set.seed(922)
sample.data<-as.data.frame(matrix(NA,nrow = 2000,ncol = 10))
names(sample.data)<-c("INDEX","DAY1","START1","END1","DAY2","START2","END2","DAY3","START3","END3")
for(i in 1:nrow(sample.data)){
  sample.data[i,"INDEX"]<-sample(1:100,1,replace = T)
  sample.data[i,"DAY1"]<-sample(c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"),1,replace = F)
  sample.data[i,"START1"]<-sample(times,1,replace = T)
  sample.data[i,"END1"]<-sample(times,1,replace = T)
  sample.data[i,"DAY2"]<-sample(c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"),1,replace = F)
  sample.data[i,"START2"]<-sample(times,1,replace = T)
  sample.data[i,"END2"]<-sample(times,1,replace = T)
  sample.data[i,"DAY3"]<-sample(c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"),1,replace = F)
  sample.data[i,"START3"]<-sample(times,1,replace = T)
  sample.data[i,"END3"]<-sample(times,1,replace = T)
}

data<-sample.data%>%
  filter(START1<END1 & START2<END2 & START3<END3 & DAY1!=DAY2 & DAY1!=DAY3 & DAY2!=DAY3)

我知道它很丑陋且效率低下,但是数据大致处于这种结构中。您可以将其视为您在任何给定时间在机场说的员工人数,其中每一行都是员工的轮班时间。

我想创建一个热图,在y轴上将一天中的时间分为5分钟,在x轴上将一周中的天数分成几部分。我是否必须收集列并按5分钟的时间块分组?我不知道。

如果数据的结构正确,我可以按工作日和不同的5分钟数据块进行分组,然后对机场中有观察单位的每一行进行汇总。我只是不知道如何让dplyr说有人在工作而没有显式调用它,而且我不知道如何在没有for循环的情况下进行操作。如果我需要解释我将做的更好的事情,或者您对如何以正确的格式获取数据有任何聪明的想法,或者我什至以正确的方式考虑了这一点,请告诉我。我一直把头撞在桌子上,我需要离开问题一分钟,但是如果这样做可以帮助您解决问题,请执行以下绘图代码:

ggplot(data, aes(x = DAY, y = TIME_CHUNK))+
geom_tile(aes(fill = TOTAL_EMPLOYEES))+
geom_text(aes(label = TOTAL_EMPLOYEES), colour = "white",size = 3)

感谢您的时间...

2 个答案:

答案 0 :(得分:0)

这是部分解决方案,可以解决大部分问题。如果以后有时间,我会尝试完成。

首先,我将使用此处的一种技术来重塑数据:https://stackoverflow.com/a/56605646/6851825

opencv_createsamples -img image1.jpg -bg bg.txt -info info/info.lst -pngoutput info -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 -maxzangle 0.5 -num 1500 -w 80 -h 40

在这里,我将进行其他一些重塑操作,以对工作日进行排序并将TIME转换为小数,并跟踪其中的累计计数。

DAY <- grep("DAY", names(data))
START_END <- grep("START|END", names(data))
data_long <- cbind(stack(data, select = DAY), stack(data, select = START_END))
names(data_long) <- c("WEEKDAY", "DAYNUM", "TIME", "STATUS")

[缺少步骤:填写所有分钟,无变化。为此将使用library(tidyverse) data_long_count <- data_long %>% mutate(WEEKDAY = factor(WEEKDAY, levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")), TIME_dec = as.numeric(TIME %>% str_sub(end = 2)) + as.numeric(TIME %>% str_sub(start = 3))/60, STATUS = STATUS %>% str_remove("[0-9]"), count_chg = if_else(STATUS == "START", 1, -1)) %>% arrange(WEEKDAY, TIME_dec) %>% mutate(employee_count = cumsum(count_chg)) 软件包,但它更喜欢使用padrdatetime。或可以使用geom_rect来回避它。]

无论哪种情况,此热图都是“斑点”的,只有在发生变化的地方才有条纹,而并非在所有分钟之间都有条纹。

date

答案 1 :(得分:0)

我认为应该这样做

clean_colnames <- function(col_inds) {
  data %>% select(INDEX, day = col_inds[1], start = col_inds[2], end = col_inds[3])
}

bind_rows(clean_colnames(2:4), clean_colnames(5:7), clean_colnames(8:10))  %>% 
  gather(key = start_end, value = time, -INDEX, -day) %>% 
  mutate(time = paste0("20190101 ", time) %>% lubridate::ymd_hm()) %>% 
  padr::pad(group = c("INDEX", "day")) %>% 
  count(day, time) %>% 
  mutate(time = paste0(substr(time, 12, 13), substr(time, 15, 16)))