考虑到假期,将日期转换为R中的伪变量

时间:2018-07-08 14:38:23

标签: r dataframe dplyr lubridate

该帖子与最近发布的transform date into dummy variable in R相关,但更为复杂。 我有数据

   df=structure(list(Data = structure(c(4L, 5L, 6L, 7L, 8L, 9L, 10L, 
1L, 2L, 3L), .Label = c("01.01.2018", "02.01.2018", "03.01.2018", 
"25.12.2017", "26.12.2017", "27.12.2017", "28.12.2017", "29.12.2017", 
"30.12.2017", "31.12.2017"), class = "factor"), Y = 1:10), .Names = c("Data", 
"Y"), class = "data.frame", row.names = c(NA, -10L))

我不得不将日期转换成虚拟变量。如果一天是指该日期,则为1,否则为0。

PawełKozielski-Romaneczko提供的解决方案帮助了我。

library(dplyr)
library(lubridate)
library(tidyr)


df %>%
  mutate(weekDay = lubridate::dmy(Data) %>% weekdays(),
         value = 1) %>%
  spread(key=weekDay, value=value, fill=0)

但是现在,我必须添加带有假日的列。 即是假期吗?

我有辅助数据集,其中指示的日期是假期?

df1=structure(list(Data = structure(1:2, .Label = c("01.01.2018", 
"08.03.2018"), class = "factor"), name = structure(c(2L, 1L), .Label = c("International Women's Day", 
"New Year"), class = "factor")), .Names = c("Data", "name"), class = "data.frame", row.names = c(NA, 
-2L))

所以我需要这个假期作为输出

Data       Y    Mon Tue Wed Thu Fri Sat Sun New Year    International Women's Day
25.12.2017  1   1   0   0   0   0   0   0   0                 0
26.12.2017  2   0   1   0   0   0   0   0   0                 0
27.12.2017  3   0   0   1   0   0   0   0   0                 0
28.12.2017  4   0   0   0   1   0   0   0   0                 0
29.12.2017  5   0   0   0   0   1   0   0   0                 0
30.12.2017  6   0   0   0   0   0   1   0   0                 0
31.12.2017  7   0   0   0   0   0   0   1   0                 0
01.01.2018  8   1   0   0   0   0   0   0   1                 0
02.01.2018  9   0   1   0   0   0   0   0   0                 0
03.01.2018  10  0   0   1   0   0   0   0   0                 0

如何将假期添加为虚拟变量,其名称取自辅助数据集?

P.S。如果您认为该主题必须在我的上一篇文章中,请告诉我,我将其删除。

1 个答案:

答案 0 :(得分:1)

使用您的示例,我在此进行扩展。根据您的需要,使用left_join或full_join。我使用了full_join,因此结果中显示了“国际妇女节”。

我使用as.character清除名称,因为在您的示例中这是一个因素。如果名称不是一个因素,则不需要as.character。最后,我删除了No_holidays。

df %>% full_join(df1) %>% 
  mutate(weekDay = lubridate::dmy(Data) %>% weekdays(),
         name = ifelse(is.na(name), "No_Holiday", as.character(name)), 
         holiday = ifelse(is.na(name), 0, 1),
         value = 1) %>%
  spread(key = weekDay, value=value, fill=0) %>% 
  spread(key = name, value = holiday, fill = 0) %>% 
  select(-No_Holiday)

         Data  Y Friday Monday Saturday Sunday Thursday Tuesday Wednesday International Women's Day New Year
1  01.01.2018  8      0      1        0      0        0       0         0                         0        1
2  02.01.2018  9      0      0        0      0        0       1         0                         0        0
3  03.01.2018 10      0      0        0      0        0       0         1                         0        0
4  08.03.2018 NA      0      0        0      0        1       0         0                         1        0
5  25.12.2017  1      0      1        0      0        0       0         0                         0        0
6  26.12.2017  2      0      0        0      0        0       1         0                         0        0
7  27.12.2017  3      0      0        0      0        0       0         1                         0        0
8  28.12.2017  4      0      0        0      0        1       0         0                         0        0
9  29.12.2017  5      1      0        0      0        0       0         0                         0        0
10 30.12.2017  6      0      0        1      0        0       0         0                         0        0
11 31.12.2017  7      0      0        0      1        0       0         0                         0        0