创建新的列,其中modelPerformance结果在R中提升

时间:2015-09-17 04:32:04

标签: r

以下是在R:

中在uplift库中创建组的示例代码
library(uplift)
### Simulate data
set.seed(12345)
dd <- sim_pte(n = 1000, p = 5, rho = 0, sigma = sqrt(2), beta.den = 4)
dd$treat <- ifelse(dd$treat == 1, 1, 0) # required coding for upliftRF
### Fit upliftRF model
fit1 <- upliftRF(y ~ X1 + X2 + X3 + X4 + X5 + trt(treat),
                 data = dd,
                 mtry = 3,
                 ntree = 50,
                 split_method = "KL",
                 minsplit = 100,
                 verbose = TRUE)
### Fitted values on train data
pred <- predict(fit1, dd)
### Compute uplift predictions
uplift_pred <- pred[, 1] - pred[, 2]
### Put together data, predictions and add some dummy factors for illustration only
dd2 <- data.frame(dd, uplift_pred, F1 = gl(2, 50, labels = c("A", "B")),
                  F2 = gl(4, 25, labels = c("a", "b", "c", "d")))
### Profile data based on fitted model
modelProfile(uplift_pred ~ X1 + X2 + X3 + F1 + F2,
             data = dd2,
             groups = 10,
             group_label = "D",
             digits_numeric = 2,
             digits_factor = 4,
             exclude_na = FALSE,
             LaTex = FALSE)

结果显示我们可以将数据分组为10:

                 Group                                                                  
                      1       2       3       4       5       6       7       8       9      
                n       102     98      100     100     100     100     100     100     100  
    uplift_pred Avg.  0.3292  0.2292  0.1537  0.0701  0.0110  -0.0536 -0.1174 -0.1935 -0.2734
    X1          Avg.  0.8527  0.6420  0.3270  0.2959  0.1373  0.0014  -0.2662 -0.5927 -0.6762
    X2          Avg.  -0.6372 -0.4831 -0.1386 -0.1330 -0.1548 0.2872  0.0672  0.0555  0.3455 
    X3          Avg.  0.8339  0.5234  0.3197  0.1135  -0.1029 -0.0383 -0.3387 -0.3249 -0.4995
 F1 A           Pctn.  43.14   48.98   52.00   54.00   50.00   48.00   51.00   52.00   51.00 
    B           Pctn.  56.86   51.02   48.00   46.00   50.00   52.00   49.00   48.00   49.00 
 F2 a           Pctn.  24.51   24.49   21.00   26.00   26.00   24.00   34.00   21.00   20.00 
    b           Pctn.  18.63   24.49   31.00   28.00   24.00   24.00   17.00   31.00   31.00 
    c           Pctn.  27.45   25.51   27.00   22.00   25.00   27.00   22.00   20.00   29.00 
    d           Pctn.  29.41   25.51   21.00   24.00   25.00   25.00   27.00   28.00   20.00 


 10      All    
   100    1000  
 -0.3871 -0.0230
 -0.7797 -0.0054
 0.9568  0.0162 
 -0.7476 -0.0255
  50.00   50.00 
  50.00   50.00 
  29.00   25.00 
  21.00   25.00 
  25.00   25.00 
  25.00   25.00 

我想知道我是否可以在数据框(“dd”)中创建一个新列,告诉我每个观察属于哪个组。例如,第1行属于第3组,第2行属于第9组,依此类推。

1 个答案:

答案 0 :(得分:1)

最简单的方法是修改modelProfile函数以输出组。您可能想要重命名该函数以确保;)

在model modelFrofile中,组被添加到以下行中的数据中:

  dframe <- data.frame(mf, Group)

所以最简单的方法是只返回数据帧dframe或将两者都作为列表返回:

return(list(resulttable = res,newdata = dframe))