我正在尝试按Holidays
或Normal
对数据框中的实例进行分类。
我的Holidays
对象中的日期必须归类为list/dataframe
,而我想要在另一个测试对象中分类的日期。
要归类为Holidays
,除了必须以这种方式分类的日期之间,其中一个Condition
列中的list/daraframe
必须为1
} {而不是0
(即,当且仅当对应的Holidays
为{{1}时,具有实际在Normal
个日期之间的日期的实例应标记为Condition
}})。
包含数据库的对象应标记为0
的日期:
Holidays
我要标记的日期:
holidays2015 <- list(list("2015-01-01",1,1,1),
list("2015-01-06",0,1,1),
list("2015-03-19",0,1,1),
list("2015-04-02",0,1,1),
list("2015-04-03",0,1,1),
list("2015-05-01",1,1,1),
list("2015-05-02",0,1,1),
list("2015-05-15",0,1,1),
list("2015-06-04",0,1,1),
list("2015-08-15",1,1,0),
list("2015-10-12",1,1,1),
list("2015-11-09",0,1,1),
list("2015-12-08",1,1,0),
list("2015-12-24",0,0,1),
list("2015-12-25",1,1,0),
list("2015-12-31",0,0,1))
holidays2014 <- list(list("2014-01-01",1,1,1),
list("2014-01-06",0,1,1),
list("2014-04-17",0,1,1),
list("2014-04-18",0,1,1),
list("2014-05-01",1,1,1),
list("2014-05-02",0,1,0),
list("2014-05-15",0,1,1),
list("2014-06-19",0,1,1),
list("2014-08-15",1,1,1),
list("2014-11-01",1,1,0),
list("2014-11-10",0,1,1),
list("2014-12-06",1,1,1),
list("2014-12-08",1,1,0),
list("2014-12-25",1,1,1))
totalholidays <- list(holidays2015, holidays2014)
dfholidays <- lapply(totalholidays, function(x) data.table::rbindlist(x))
dfholidays <- data.table::rbindlist(dfholidays)
names(dfholidays) <- c("Date", "V2", "V3", "Condition")
我的工作解决方案是为了bucle:
SlowWay
mytestingdates <- as.data.frame(list("Date" = c("2014-01-07", "2014-08-15",
"2015-06-04", "2015-08-15")))
但我想要一个更有效的解决方案。我尝试了一些R选项但失败了:
R希望看起来相似的解决方案:
holidaysvector <- c()
for (ii in 1:nrow(mytestingdates)){
if (mytestingdates$Date[ii] %in% dfholidays$Date){
tmp <- which(dfholidays$Date == mytestingdates$Date[ii])
if (dfholidays$Condition[tmp] == 1) {
holidaysvector <- c(holidaysvector, "Holidays")
} else { holidaysvector <- c(holidaysvector, "Normal T.1") }
} else { holidaysvector <- c(holidaysvector, "Normal T.2") }
}
mytestingdates$forsolution <- holidaysvector
rm(tmp)
mytestingdates$MyRtry <- ifelse(mytestingdates$Date %in% dfholidays$Date,
ifelse(dfholidays$Condition == 1, "Holiday", "Normal T.1"), "Normal T.2")
请注意,实例no.4位于 Date MyRtry forsolution
1 2014-01-07 Normal T.2 Normal T.2
2 2014-08-15 Holiday Holidays
3 2015-06-04 Holiday Holidays
4 2015-08-15 Holiday Normal T.1
对象中,但其Holidays
为0,因此标记为condition
天,这在我的R解决方案中会遗漏。
任何想法?从我的代码中获得的干净代码或编程技术的任何建议都将非常适用。
答案 0 :(得分:1)
您是否对dplyr解决方案持开放态度?
library(dplyr)
mytestingdates %>%
left_join(dfholidays) %>%
mutate(forsolution = ifelse(is.na(Condition), "Normal T.2", ifelse(Condition == 0, "Normal T.1", "Holidays")))
在这里,dfholidays加入了mytestingdates。如果mytestingdates中的日期不在dfholidays中,则它仅合并那些日期的NA。那么你可以检查Condition是否为NA,如果是,你将forsolution设置为“Normal T.2”。随后,检查Condition == 0,如果是,则让forsolution为“Normal T.1”。在其他情况下,forsolution将是“假期”。
Date V2 V3 Condition forsolution
1 2014-01-07 NA NA NA Normal T.2
2 2014-08-15 1 1 1 Holidays
3 2015-06-04 0 1 1 Holidays
4 2015-08-15 1 1 0 Normal T.1
更新:更短的时间是:
mytestingdates %>%
left_join(dfholidays) %>%
mutate(forsolution = case_when(is.na(Condition) ~ "Normal T.2", Condition == 0 ~ "Normal T.1", TRUE ~ "Holidays"))
答案 1 :(得分:1)
此解决方案不区分NormalT1和NormalT2,但它非常简单:
mytestingdates["classifier"] <- ifelse(mytestingdates$Date %in% dfholidays[dfholidays$Condition==1]$Date,"Holiday", "Normal")
mytestingdates
Date classifier
1 2014-01-07 Normal
2 2014-08-15 Holiday
3 2015-06-04 Holiday
4 2015-08-15 Normal