我正在尝试将DeepSurv(https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-018-0482-1)theano代码(https://github.com/jaredleekatzman/DeepSurv/blob/master/deepsurv/deep_surv.py)转换为keras格式。当前,我正在尝试创建成本/损失功能。
为了使人们不必在github页面上进行搜索,他们的代码是
def _negative_log_likelihood(self, E, deterministic = False):
"""Return the negative average log-likelihood of the prediction
of this model under a given target distribution.
.. math::
\frac{1}{N_D} \sum_{i \in D}[F(x_i,\theta) - log(\sum_{j \in R_i} e^F(x_j,\theta))]
- \lambda P(\theta)
where:
D is the set of observed events
N_D is the number of observed events
R_i is the set of examples that are still alive at time of death t_j
F(x,\theta) = log hazard rate
Note: We assume that there are no tied event times
Parameters:
E (n,): TensorVector that corresponds to a vector that gives the censor
variable for each example
deterministic: True or False. Determines if the output of the network
is calculated deterministically.
Returns:
neg_likelihood: Theano expression that computes negative
partial Cox likelihood
"""
risk = self.risk(deterministic)
hazard_ratio = T.exp(risk)
log_risk = T.log(T.extra_ops.cumsum(hazard_ratio))
uncensored_likelihood = risk.T - log_risk
censored_likelihood = uncensored_likelihood * E
num_observed_events = T.sum(E)
neg_likelihood = -T.sum(censored_likelihood) / num_observed_events
return neg_likelihood
因为我仍然是新的编码人员,所以我试图不必重写用于计算对数风险函数的函数,并且一直在探索生命线(https://lifelines.readthedocs.io/en/latest/)。
我的模型的我最困惑的地方是
我打算在这里使用“生命线”。
from lifelines import CoxPHFitter
cph = CoxPHFitter()
cph.fit(x_norm_y_joined, duration_col='survival_fu_combine', event_col='death', show_progress=False)
cph.summary
test3 = pd.DataFrame(cph.summary).T
test3
然后我要用代码对exp(coef)行求和
test3.iloc[1,].sum()
最后得到
我会放
-np.log(test3.iloc[1,].sum())
我的思维过程正确吗?我有疑问,因为我不确定如何将零件编码为红色
E是
E (n,): TensorVector that corresponds to a vector that gives the censor
variable for each example