R:将长格式转换为宽格式填写缺失日期

时间:2016-11-23 13:28:13

标签: r date dataframe crosstab

我正在重塑我公司的小时注册数据,以适应某种格式。我已将输入修改为如下所示:

   employee project month day hours
1         A  16-001     9   9     5
2         B  16-001     9  29     1
3         A  16-001     9   3     5
4         B  16-001     9  28     2
5         A  16-002     9   8     6
6         B  16-002     9   9     4
7         A  16-002    10  25     6
8         B  16-002    10  21     8
9         A  overig    10   6     6
10        B  overig    10  17     7
11        A  overig    10   9     1
12        B  overig    10  10     7

#reproducicle data:  
df <- data.frame(employee = rep(c("A","B"),6),project=rep(c("16-001","16-002","overig"), each=4), month=rep(c(9,10),each=6),day=sample(1:30,12,replace=T), hours=sample(1:8,12,replace=T))

#Now, I need to move this to a cross table: 
res <- ftable(xtabs(hours~month+employee+project+day, aggregate(hours~month+employee+project+day, data=df, FUN=sum)))

#And put this cross table in a data.frame (for export to csv)
library(reshape2) 
df_res <- dcast(as.data.frame(res), as.formula(paste(paste(names(attr(res, "row.vars")), collapse="+"), "~", paste(names(attr(res, "col.vars"))))))

df_res

   month employee project 3 6 8 9 10 17 21 25 28 29
1      9        A  16-001 5 0 0 5  0  0  0  0  0  0
2      9        A  16-002 0 0 6 0  0  0  0  0  0  0
3      9        A  overig 0 0 0 0  0  0  0  0  0  0
4      9        B  16-001 0 0 0 0  0  0  0  0  2  1
5      9        B  16-002 0 0 0 4  0  0  0  0  0  0
6      9        B  overig 0 0 0 0  0  0  0  0  0  0
7     10        A  16-001 0 0 0 0  0  0  0  0  0  0
8     10        A  16-002 0 0 0 0  0  0  0  6  0  0
9     10        A  overig 0 6 0 1  0  0  0  0  0  0
10    10        B  16-001 0 0 0 0  0  0  0  0  0  0
11    10        B  16-002 0 0 0 0  0  0  8  0  0  0
12    10        B  overig 0 0 0 0  7  7  0  0  0  0

我不确定这是最好的方法,但格式现在很好。但是,我需要将所有日期作为列,而不仅仅是我的data.frame中的日期(所以31列,最好是不存在的日期(如sep 31),NA为NA,其余为0。建议如何获得?

1 个答案:

答案 0 :(得分:1)

我认为这是一个可以接受的解决方案,它也将处理闰年(奖励积分)。仍然利用tidyr::spread()使用drop = F的漂亮因子填充行为,但现在使用函数lubridate::days_in_month()仅传播但到目前为止。我们走了:

library(tidyr)
library(lubridate)
library(purrr)

df$year <- 2016 
df$num_in_month <- ymd(paste(df$year, df$month, df$day)) %>%
    days_in_month()

df %>% split(.$month) %>%
    map(~mutate(., day = factor(day, levels = 1:unique(num_in_month)))) %>%
    map(~spread(., key = day, value = hours, fill = 0, drop = F)) %>%
    bind_rows() %>%
    select(-num_in_month)

  employee project month year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1        A  16-001     9 2016 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  8  0  0 NA
2        A  16-002     9 2016 0 0 0 0 5 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 NA
3        B  16-001     9 2016 0 0 7 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  5  0  0  0 NA
4        B  16-002     9 2016 0 0 0 0 0 0 0 0 0  0  0  0  0  0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 NA
5        A  16-002    10 2016 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6        A  overig    10 2016 0 4 0 0 0 0 0 0 0  5  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
7        B  16-002    10 2016 0 0 0 0 0 0 0 0 0  0  0  0  7  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
8        B  overig    10 2016 0 0 0 0 6 0 0 0 0  0  0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

干杯