在Dataframe列中查找与R

时间:2018-05-30 13:01:54

标签: r date dplyr

我正在尝试按HolidaysNormal对数据框中的实例进行分类。

我的Holidays对象中的日期必须归类为list/dataframe,而我想要在另一个测试对象中分类的日期。

要归类为Holidays,除了必须以这种方式分类的日期之间,其中一个Condition列中的list/daraframe必须为1 } {而不是0(即,当且仅当对应的Holidays为{{1}时,具有实际在Normal个日期之间的日期的实例应标记为Condition }})。

包含数据库的对象应标记为0的日期:

Holidays

我要标记的日期:

holidays2015 <- list(list("2015-01-01",1,1,1),
                     list("2015-01-06",0,1,1),
                     list("2015-03-19",0,1,1),
                     list("2015-04-02",0,1,1),
                     list("2015-04-03",0,1,1),
                     list("2015-05-01",1,1,1),
                     list("2015-05-02",0,1,1),
                     list("2015-05-15",0,1,1),
                     list("2015-06-04",0,1,1),
                     list("2015-08-15",1,1,0),
                     list("2015-10-12",1,1,1),
                     list("2015-11-09",0,1,1),
                     list("2015-12-08",1,1,0),
                     list("2015-12-24",0,0,1),
                     list("2015-12-25",1,1,0),
                     list("2015-12-31",0,0,1))

holidays2014 <- list(list("2014-01-01",1,1,1),
                     list("2014-01-06",0,1,1),
                     list("2014-04-17",0,1,1),
                     list("2014-04-18",0,1,1),
                     list("2014-05-01",1,1,1),
                     list("2014-05-02",0,1,0),
                     list("2014-05-15",0,1,1),
                     list("2014-06-19",0,1,1),
                     list("2014-08-15",1,1,1),
                     list("2014-11-01",1,1,0),
                     list("2014-11-10",0,1,1),
                     list("2014-12-06",1,1,1),
                     list("2014-12-08",1,1,0),
                     list("2014-12-25",1,1,1))
totalholidays <- list(holidays2015, holidays2014)
dfholidays <- lapply(totalholidays, function(x) data.table::rbindlist(x))
dfholidays <- data.table::rbindlist(dfholidays)
names(dfholidays) <- c("Date", "V2", "V3", "Condition")

我的工作解决方案是为了bucle:

SlowWay

mytestingdates <- as.data.frame(list("Date" = c("2014-01-07", "2014-08-15", 
"2015-06-04", "2015-08-15")))

但我想要一个更有效的解决方案。我尝试了一些R选项但失败了:

R希望看起来相似的解决方案:

holidaysvector <- c()
for (ii in 1:nrow(mytestingdates)){
  if (mytestingdates$Date[ii] %in% dfholidays$Date){
    tmp <- which(dfholidays$Date == mytestingdates$Date[ii])
    if (dfholidays$Condition[tmp] == 1) {
      holidaysvector <- c(holidaysvector, "Holidays")
    } else { holidaysvector <- c(holidaysvector, "Normal T.1") }
    } else { holidaysvector <- c(holidaysvector, "Normal T.2") }
}
mytestingdates$forsolution <- holidaysvector
rm(tmp)

期望的解决方案

mytestingdates$MyRtry <- ifelse(mytestingdates$Date %in% dfholidays$Date, 
ifelse(dfholidays$Condition == 1, "Holiday", "Normal T.1"), "Normal T.2")

请注意,实例no.4位于 Date MyRtry forsolution 1 2014-01-07 Normal T.2 Normal T.2 2 2014-08-15 Holiday Holidays 3 2015-06-04 Holiday Holidays 4 2015-08-15 Holiday Normal T.1 对象中,但其Holidays为0,因此标记为condition天,这在我的R解决方案中会遗漏。

任何想法?从我的代码中获得的干净代码或编程技术的任何建议都将非常适用。

2 个答案:

答案 0 :(得分:1)

您是否对dplyr解决方案持开放态度?

library(dplyr)
mytestingdates %>% 
  left_join(dfholidays) %>% 
  mutate(forsolution = ifelse(is.na(Condition), "Normal T.2", ifelse(Condition == 0, "Normal T.1", "Holidays"))) 

在这里,dfholidays加入了mytestingdates。如果mytestingdates中的日期不在dfholidays中,则它仅合并那些日期的NA。那么你可以检查Condition是否为NA,如果是,你将forsolution设置为“Normal T.2”。随后,检查Condition == 0,如果是,则让forsolution为“Normal T.1”。在其他情况下,forsolution将是“假期”。

        Date V2 V3 Condition forsolution
1 2014-01-07 NA NA        NA  Normal T.2
2 2014-08-15  1  1         1    Holidays
3 2015-06-04  0  1         1    Holidays
4 2015-08-15  1  1         0  Normal T.1

更新:更短的时间是:

mytestingdates %>% 
  left_join(dfholidays) %>% 
  mutate(forsolution = case_when(is.na(Condition) ~ "Normal T.2", Condition == 0 ~ "Normal T.1",  TRUE ~ "Holidays"))

答案 1 :(得分:1)

此解决方案不区分NormalT1和NormalT2,但它非常简单:

mytestingdates["classifier"] <- ifelse(mytestingdates$Date %in% dfholidays[dfholidays$Condition==1]$Date,"Holiday", "Normal")

mytestingdates

        Date classifier
1 2014-01-07     Normal
2 2014-08-15    Holiday
3 2015-06-04    Holiday
4 2015-08-15     Normal