我们想分析一个新收购的客户在R中仍然是客户的时间。数据集在730天被正确审查,我们有十个独立变量。
该模型如下所示:ln(持续时间)= X'B + S * e, 其中X是10个独立变量的矩阵,B是系数向量,S是尺度参数,e是误差项
我们使用的数据集如下: http://www.drvkumar.com/books/25/Statistical-Methods-in-Customer-Relationship-Management
我们使用生存包及其幸存函数并输入以下代码:
Dur <- survreg(Surv(Duration, Censor) ~ Acq_Expense + Acq_Expense_SQ + Ret_Expense + Ret_Expense_SQ + Crossbuy + Frequency + Frequency_SQ + Industry + Revenue + Employees, dist='weibull', data = daten [daten$Acquisition==1, ])
summary(Dur)
但结果不正确,因为使用SAS代码会生成另一个输出(确认为正确)。
我们尝试生成持续时间的日志变量,并在前面描述的模型中实现了新变量logDur:
> logDur <- log(daten$Duration)
> Dur <- survreg(Surv(logDur, Censor) ~ Acq_Expense + Acq_Expense_SQ + Ret_Expense + Ret_Expense_SQ + Crossbuy + Frequency + Frequency_SQ + Industry + Revenue + Employees, dist='weibull', data = daten [daten$Acquisition==1, ])
> summary(Dur)
但弹出以下错误消息: Surv中的Fehler(logDur,Censor):时间和状态是不同的长度
如果有帮助,请参阅以下SAS代码:
proc lifereg data = statcrm.customer_acquisition;
model duration*censor(1) = acq_expense acq_expense_sq ret_expense ret_expense_sq crossbuy frequency frequency_sq industry revenue employees;
where acquisition = 1;
output out = statcrm.duration xbeta = xb p = pred sres = resid;
run; quit;
data statcrm.duration1;
set statcrm.duration;
pred_duration = exp(xb+0.138*(log(-log(1-0.5))));
ad = abs(duration - pred_duration);
ad1 = abs(duration - 333.3165);
run; quit;
proc sql; select mean(duration) from statcrm.duration1 where acquisition = 1 and censor = 0; quit;
proc sql; select mean(ad) as mad, (mean(ad/duration)) as mape,
mean(ad1) as random_mad, (mean(ad1/duration)) as mape1
from statcrm.duration1 where acquisition = 1 and censor = 0; quit;