以下是在R:
中在uplift库中创建组的示例代码library(uplift)
### Simulate data
set.seed(12345)
dd <- sim_pte(n = 1000, p = 5, rho = 0, sigma = sqrt(2), beta.den = 4)
dd$treat <- ifelse(dd$treat == 1, 1, 0) # required coding for upliftRF
### Fit upliftRF model
fit1 <- upliftRF(y ~ X1 + X2 + X3 + X4 + X5 + trt(treat),
data = dd,
mtry = 3,
ntree = 50,
split_method = "KL",
minsplit = 100,
verbose = TRUE)
### Fitted values on train data
pred <- predict(fit1, dd)
### Compute uplift predictions
uplift_pred <- pred[, 1] - pred[, 2]
### Put together data, predictions and add some dummy factors for illustration only
dd2 <- data.frame(dd, uplift_pred, F1 = gl(2, 50, labels = c("A", "B")),
F2 = gl(4, 25, labels = c("a", "b", "c", "d")))
### Profile data based on fitted model
modelProfile(uplift_pred ~ X1 + X2 + X3 + F1 + F2,
data = dd2,
groups = 10,
group_label = "D",
digits_numeric = 2,
digits_factor = 4,
exclude_na = FALSE,
LaTex = FALSE)
结果显示我们可以将数据分组为10:
Group
1 2 3 4 5 6 7 8 9
n 102 98 100 100 100 100 100 100 100
uplift_pred Avg. 0.3292 0.2292 0.1537 0.0701 0.0110 -0.0536 -0.1174 -0.1935 -0.2734
X1 Avg. 0.8527 0.6420 0.3270 0.2959 0.1373 0.0014 -0.2662 -0.5927 -0.6762
X2 Avg. -0.6372 -0.4831 -0.1386 -0.1330 -0.1548 0.2872 0.0672 0.0555 0.3455
X3 Avg. 0.8339 0.5234 0.3197 0.1135 -0.1029 -0.0383 -0.3387 -0.3249 -0.4995
F1 A Pctn. 43.14 48.98 52.00 54.00 50.00 48.00 51.00 52.00 51.00
B Pctn. 56.86 51.02 48.00 46.00 50.00 52.00 49.00 48.00 49.00
F2 a Pctn. 24.51 24.49 21.00 26.00 26.00 24.00 34.00 21.00 20.00
b Pctn. 18.63 24.49 31.00 28.00 24.00 24.00 17.00 31.00 31.00
c Pctn. 27.45 25.51 27.00 22.00 25.00 27.00 22.00 20.00 29.00
d Pctn. 29.41 25.51 21.00 24.00 25.00 25.00 27.00 28.00 20.00
10 All
100 1000
-0.3871 -0.0230
-0.7797 -0.0054
0.9568 0.0162
-0.7476 -0.0255
50.00 50.00
50.00 50.00
29.00 25.00
21.00 25.00
25.00 25.00
25.00 25.00
我想知道我是否可以在数据框(“dd”)中创建一个新列,告诉我每个观察属于哪个组。例如,第1行属于第3组,第2行属于第9组,依此类推。
答案 0 :(得分:1)
最简单的方法是修改modelProfile函数以输出组。您可能想要重命名该函数以确保;)
在model modelFrofile中,组被添加到以下行中的数据中:
dframe <- data.frame(mf, Group)
所以最简单的方法是只返回数据帧dframe
或将两者都作为列表返回:
return(list(resulttable = res,newdata = dframe))