Question

我正在尝试为产生最大成功率的两项活动确定适当的阈值。

下面列出的是我要完成的示例。对于每个位置，我都试图确定用于活动1和活动2的阈值，因此，如果满足任何一个条件，我们将猜测为“是”（1）。然后，我需要确保每个位置的总体积中只有一定百分比的情况下我们猜为“是”，并且我们正在最大限度地提高准确性（我们的“是” =“结果”的猜测为1）。

location <- c(1,2,3)    
testFile <- data.frame(location = rep.int(location, 20),
                          activity1 = round(rnorm(20, mean = 10, sd = 3)),
                          activity2 = round(rnorm(20, mean = 20, sd = 3)),
                          outcome = rbinom(20,1,0.5)
                       )
    set.seed(145)
    act_1_thresholds <- seq(7,12,1)
    act_2_thresholds <- seq(19,24,1)

我能够通过创建一个包含活动1和活动2阈值的所有可能唯一组合的表，然后将其与样本数据集中的每个观察值合并来完成此操作。但是，在实际数据集中有约200个位置，每个位置都有成千上万的观测值，我很快就用光了空间。

我想创建一个函数，该函数获取位置ID，活动1以及活动2可能的阈值的集合，然后计算我们多久会猜到是（即'activity1'或'活动2'超出了我们正在测试的相应阈值），以确保我们的应用率保持在我们期望的范围内（50％-75％）。然后，对于在我们期望范围内产生施用率的每组阈值，我们只希望存储一组最大准确性的方法，以及它们各自的位置编号，施用率和准确性率。所需的输出在下面列出。

      location act_1_thresh act_2_thresh application_rate accuracy_rate
1        1           13           19             0.52          0.45
2        2           11           24             0.57          0.53
3        3           14           21             0.67          0.42

我曾尝试将其写入for循环中，但无法解决所有这些情况，因此无法遍历嵌套参数的数量。任何尝试过类似问题的人都将提供帮助。谢谢！

下面列出了如何计算单个阈值集的应用率和准确率的示例。

### Create yard IDs
location <- c(1,2,3)

### Create a single set of thresholds
single_act_1_threshold <- 12
single_act_2_threshold <- 20

### Calculate the simulated application, and success rate of thresholds mentioned above using historical data
as.data.table(testFile)[,
                        list(
                        application_rate = round(sum(ifelse(single_act_1_threshold <= activity1 | single_act_2_threshold <= activity2, 1, 0))/
                                                   nrow(testFile),2),
                        accuracy_rate = round(sum(ifelse((single_act_1_threshold <= activity1 | single_act_2_threshold <= activity2) & (outcome == 1), 1, 0))/
                                                sum(ifelse(single_act_1_threshold <= activity1 | single_act_2_threshold <= activity2, 1, 0)),2)
                        ),
                        by = location]

Answer 1

请考虑使用expand.grid来构建两个阈值之间所有组合的数据帧。然后使用Map在数据框的两列之间逐元素进行迭代，以构建数据表列表（数据表的列表现在包括每个阈值指示符的列）。

act_1_thresholds <- seq(7,12,1)
act_2_thresholds <- seq(19,24,1)

# ALL COMBINATIONS
thresholds_df <- expand.grid(th1=act_1_thresholds, th2=act_2_thresholds)

# USER-DEFINED FUNCTION
calc <- function(th1, th2)
     as.data.table(testFile)[, list(
                                  act_1_thresholds = th1,     # NEW COLUMN
                                  act_2_thresholds = th2,     # NEW COLUMN                      
                                  application_rate = round(sum(ifelse(th1 <= activity1 | th2 <= activity2, 1, 0)) /
                                                           nrow(testFile),2),
                                  accuracy_rate = round(sum(ifelse((th1 <= activity1 | th2 <= activity2) & (outcome == 1), 1, 0)) /
                                                        sum(ifelse(th1 <= activity1 | th2 <= activity2, 1, 0)),2)
                                ), by = location]    
# LIST OF DATA TABLES
dt_list <- Map(calc, thresholds_df$th1, thresholds_df$th2)

# NAME ELEMENTS OF LIST
names(dt_list) <- paste(thresholds_df$th1, thresholds_df$th2, sep="_")

# SAME RESULT AS POSTED EXAMPLE
dt_list$`12_20`  
#    location act_1_thresholds act_2_thresholds application_rate accuracy_rate
# 1:        1               12               20             0.23           0.5
# 2:        2               12               20             0.23           0.5
# 3:        3               12               20             0.23           0.5

如果需要附加所有元素，请使用data.table的rbindlist：

final_dt <- rbindlist(dt_list)
final_dt

#      location act_1_thresholds act_2_thresholds application_rate accuracy_rate
#   1:        1                7               19             0.32          0.47
#   2:        2                7               19             0.32          0.47
#   3:        3                7               19             0.32          0.47
#   4:        1                8               19             0.32          0.47
#   5:        2                8               19             0.32          0.47
#  ---                                                                          
# 104:        2               11               24             0.20          0.42
# 105:        3               11               24             0.20          0.42
# 106:        1               12               24             0.15          0.56
# 107:        2               12               24             0.15          0.56
# 108:        3               12               24             0.15          0.56

跨多个因素的优化功能

1 个答案: