加速嵌套的ifelse语句 - R.

时间:2016-10-02 18:16:54

标签: r if-statement nested

我的代码中的示例:

    time_elapsed                     network_name             daypart       day
 1:         4705                          Laff TV 2016-09-09 03:11:35    Friday
 2:         1800                              CNN 2016-09-10 08:00:00  Saturday
 3:           23                             INSP 2016-09-02 18:00:00    Friday
 4:          148                              NBC 2016-09-02 16:01:26    Friday
 5:          957                  History Channel 2016-09-07 14:44:03 Wednesday
 6:         1138         Nickelodeon/Nick-at-Nite 2016-09-09 16:00:00    Friday
 7:          120                       Starz Edge 2016-09-07 15:28:59 Wednesday
 8:          268            Starz Encore Westerns 2016-09-07 17:13:05 Wednesday
 9:            6                              CBS 2016-09-10 04:00:00  Saturday
10:           69                      Independent 2016-09-07 12:48:11 Wednesday
11:         4151                              NBC 2016-09-09 04:32:37    Friday
12:          570 PBS: Public Broadcasting Service 2016-09-07 16:17:58 Wednesday
13:         1421                            NBCSN 2016-09-03 15:22:23  Saturday
14:          466          Estrella TV (Broadcast) 2016-09-04 19:00:00    Sunday

(通常超过2亿行)

几个月前,当我在几百万行上运行我的整个脚本时,我编写了以下嵌套的ifelse语句,但现在我在更大的范围内运行它我真的很喜欢找到一种让它快一点的方法。

targets_random$daypart <- ifelse((wday(targets_random$daypart) == 1 | 
                wday(targets_random$daypart) == 7), "W: Weekend",
                        ifelse(hour(targets_random$daypart) <= 2, "LP: Late Prime",
                        ifelse((hour(targets_random$daypart) >= 3 & 
                hour(targets_random$daypart) <= 5), "O: Overnight",
                        ifelse((hour(targets_random$daypart) >= 6 & 
                hour(targets_random$daypart) <= 9), "EM: Early Morning",
                        ifelse((hour(targets_random$daypart) >= 10 & 
                hour(targets_random$daypart) <= 16), "D: Day",
                        ifelse((hour(targets_random$daypart) >= 17 & 
                hour(targets_random$daypart) <= 20), "F: Fringe",
                        ifelse(hour(targets_random$daypart) >= 21, "P: Prime", NA)))))))

我尝试使用data.table解决方案但速度非常快,并将data.table转换为列表。对于我的生活,我无法理解为什么。这增加了足够的时间来取消它并不值得节省。

我们非常感谢任何建议。我有什么工作,如果我必须坚持下去,那就没事了。目前大约需要3.5小时才能完成整个代码。最大的部分是SQL查询和结果的文件创建,但如果我能尽可能多地节省时间,那将是很好的!

(作为旁注 - 过去差不多8小时我用data.table语法替换了大量的部件。我现在是一个官方粉丝!)

1 个答案:

答案 0 :(得分:0)

考虑为所有可能的组合及其结果构建一个单独的静态日期数据框。在SQL实践中,这将被视为查找表。然后定期与完整数据表合并。

# DF (N=168) 7 X 24
daytimes <- expand.grid(wday=c(1:7),
                        hour=c(1:24))    
daytimes$result <- 
  ifelse((daytimes$wday == 1|daytimes$wday == 7), "W: Weekend",
       ifelse(daytimes$hour <= 2, "LP: Late Prime",
             ifelse((daytimes$hour >= 3 & daytimes$hour <= 5), "O: Overnight",
                    ifelse((daytimes$hour >= 6 & daytimes$hour <= 9), "EM: Early Morning",
                           ifelse((daytimes$hour >= 10 & daytimes$hour <= 16), "D: Day",
                                  ifelse((daytimes$hour >= 17 & daytimes$hour <= 20), "F: Fringe",
                                         ifelse(daytimes$hour >= 21, "P: Prime", NA)))))))
# CREATE MERGE FIELDS
targets_random$wday <- wday(targets_random$daypart)
targets_random$hour <- hour(targets_random$daypart)

# MERGE WITH NEW COLUMN: result
targets_random <- merge(targets_random, daytimes, by=c("wday", "hour"))