Question

我正在尝试使用Cox比例风险模型对生存时间进行建模，我想使用梯度增强框架（xgboost或lightgbm）。我知道xgboost具有coxph损失实现，但是我的损失函数做了一些修改，将生存时间分组在不同的实验组中，并且我实际上对组之间的排名和排列概率感兴趣。每个组的大小可以不同，并且可能有正确的审查数据。

例如，假设我们有变量，时间= [10,3,1,6,4,1,7,30,21,15,25,24]，组= [0,0,0,1 ，1,1,1,1,2,2,2,2]，观察值= [1,0,1,1,1,0,1,0,0,1,1,1,1]，其中时间是生存时间，groups是可以根据其进行生存时间分组的组id，观察到的是检查状态（1是观察到的数据，0表示被检查的数据）。

我已经使用Tensorflow用神经网络实现了该模型，但是现在我想在xgboost中尝试它，我不确定我的损失函数返回的grad和hess是我的情况（请参阅doc： https://github.com/dmlc/xgboost/blob/master/demo/guide-python/custom_objective.py）。

这是我在Tensorflow中为神经网络版本实现的代码：

def compute_cox_loss(time, predict, observed, groups):

  '''predict is the score output by the model, 
  other param are the same as in the problem description
  '''

  #ensure the data have the correct shape
  time = tf.reshape(time, (-1,))
  predict = tf.reshape(predict, (-1,))
  observed = tf.reshape(observed, (-1,))

  #split the data into groups
  splitted_time = tf.split(time, groups, axis=0)
  splitted_predict = tf.split(predict, groups, axis=0)
  splitted_observed = tf.split(observed, groups, axis=0)

  #target is the total loss
  target = 0
  count = 1
  batch_size = len(splitted_time)

  #for each group, calculate the CoxPH loss and add it into target
  for i in range(batch_size):

    sorted_time = tf.sort(splitted_time[i], direction='ASCENDING')
    sort_arg = tf.argsort(splitted_time[i], direction='ASCENDING')

    sorted_predict = tf.exp( tf.gather(splitted_predict[i], sort_arg) )
    sorted_observed = tf.gather(splitted_observed[i], sort_arg)

    mask = tf.cast( sorted_observed, tf.bool )      
    L = tf.boolean_mask( tf.log(sorted_predict) - tf.log( tf.cumsum(sorted_predict, reverse=True) ), mask )
    loss = -tf.reduce_mean(L)

    target += loss
    count += 1

  return target/count

此外，我在官方github中找不到在xgboost中实现coxph损失的代码。谁能指出代码在哪里，以便我可以根据自己的问题进行修改？

在xgboost或lightgbm中自定义的cox比例风险损失函数

0 个答案: