分发,生存分析的生存时间无效;包存活

时间:2013-04-06 10:04:29

标签: r sas survival-analysis

我们想分析一个新收购的客户在R中仍然是客户的时间。数据集在730天被正确审查,我们有十个独立变量。

该模型如下所示:ln(持续时间)= X'B + S * e, 其中X是10个独立变量的矩阵,B是系数向量,S是尺度参数,e是误差项

我们使用的数据集如下: http://www.drvkumar.com/books/25/Statistical-Methods-in-Customer-Relationship-Management

我们使用生存包及其幸存函数并输入以下代码:

Dur <- survreg(Surv(Duration, Censor) ~ Acq_Expense + Acq_Expense_SQ + Ret_Expense + Ret_Expense_SQ + Crossbuy + Frequency + Frequency_SQ + Industry + Revenue + Employees, dist='weibull', data = daten [daten$Acquisition==1, ])
summary(Dur)

但结果不正确,因为使用SAS代码会生成另一个输出(确认为正确)。

我们尝试生成持续时间的日志变量,并在前面描述的模型中实现了新变量logDur:

> logDur <- log(daten$Duration)
> Dur <- survreg(Surv(logDur, Censor) ~ Acq_Expense + Acq_Expense_SQ + Ret_Expense + Ret_Expense_SQ + Crossbuy + Frequency + Frequency_SQ + Industry + Revenue + Employees, dist='weibull', data = daten [daten$Acquisition==1, ])
> summary(Dur)

但弹出以下错误消息: Surv中的Fehler(logDur,Censor):时间和状态是不同的长度

如果有帮助,请参阅以下SAS代码:

proc lifereg data = statcrm.customer_acquisition;
model duration*censor(1) =  acq_expense acq_expense_sq ret_expense ret_expense_sq crossbuy frequency frequency_sq industry revenue employees;
where acquisition = 1; 
output out = statcrm.duration xbeta = xb p = pred sres = resid;
run; quit;

data statcrm.duration1;
set statcrm.duration;
pred_duration = exp(xb+0.138*(log(-log(1-0.5))));
ad = abs(duration - pred_duration); 
ad1 = abs(duration - 333.3165);
run; quit;

proc sql; select mean(duration) from statcrm.duration1 where acquisition = 1 and censor = 0; quit;

proc sql; select mean(ad) as mad, (mean(ad/duration)) as mape, 
mean(ad1) as random_mad, (mean(ad1/duration)) as mape1 
from statcrm.duration1 where acquisition = 1 and censor = 0; quit;

0 个答案:

没有答案