如何在Tensorflow中计算Spearman相关性

时间:2018-11-21 01:58:35

标签: python python-3.x tensorflow metrics

问题

我需要计算Pearson和Spearman相关性,并将其用作张量流中的指标。

对于Pearson来说,这很简单:

tf.contrib.metrics.streaming_pearson_correlation(y_pred, y_true)

但对于Spearman,我一无所知!

我尝试过的事情:

来自this answer

    samples = 1
    predictions_rank = tf.nn.top_k(y_pred, k=samples, sorted=True, name='prediction_rank').indices
    real_rank = tf.nn.top_k(y_true, k=samples, sorted=True, name='real_rank').indices
    rank_diffs = predictions_rank - real_rank
    rank_diffs_squared_sum = tf.reduce_sum(rank_diffs * rank_diffs)
    six = tf.constant(6)
    one = tf.constant(1.0)
    numerator = tf.cast(six * rank_diffs_squared_sum, dtype=tf.float32)
    divider = tf.cast(samples * samples * samples - samples, dtype=tf.float32)
    spearman_batch = one - numerator / divider

但这返回NaN ...


definition of Wikipedia之后:  enter image description here

我尝试过:

size = tf.size(y_pred)
indice_of_ranks_pred = tf.nn.top_k(y_pred, k=size)[1]
indice_of_ranks_label = tf.nn.top_k(y_true, k=size)[1]
rank_pred = tf.nn.top_k(-indice_of_ranks_pred, k=size)[1]
rank_label = tf.nn.top_k(-indice_of_ranks_label, k=size)[1]
rank_pred = tf.to_float(rank_pred)
rank_label = tf.to_float(rank_label)
spearman = tf.contrib.metrics.streaming_pearson_correlation(rank_pred, rank_label)

但是运行此命令时,出现以下错误:

  

tensorflow.python.framework.errors_impl.InvalidArgumentError:输入   必须至少有k列。必须1,需要32

     

[[{{nodemetrics / spearman / TopKV2}} = TopKV2 [T = DT_FLOAT,sorted = true,   _device =“ / job:localhost / replica:0 / task:0 / device:CPU:0”](lambda_1 / add,metrics / pearson / pearson_r / variance_predictions / Size)]]

2 个答案:

答案 0 :(得分:1)

我一直在努力按照该网站(https://rpubs.com/aaronsc32/spearman-rank-correlation的定义在Tensorflow中直接实现Spearman等级相关系数,并且我到达了以下代码(我分享了它,以防万一有人发现它有用)

@tf.function
def get_rank(y_pred):
  rank = tf.argsort(tf.argsort(y_pred, axis=-1, direction="ASCENDING"), axis=-1)+1 #+1 to get the rank starting in 1 instead of 0
  return rank

@tf.function
def sp_rank(x, y):
  cov = tfp.stats.covariance(x, y, sample_axis=0, event_axis=None)
  sd_x = tfp.stats.stddev(x, sample_axis=0, keepdims=False, name=None)
  sd_y = tfp.stats.stddev(y, sample_axis=0, keepdims=False, name=None)
  return 1-cov/(sd_x*sd_y) #1- because we want to minimize loss

@tf.function
def spearman_correlation(y_true, y_pred):
    #First we obtain the ranking of the predicted values
    y_pred_rank = tf.map_fn(lambda x: get_rank(x), y_pred, dtype=tf.float32)
    
    #Spearman rank correlation between each pair of samples:
    #Sample dim: (1, 8)
    #Batch of samples dim: (None, 8) None=batch_size=64
    #Output dim: (batch_size, ) = (64, )
    sp = tf.map_fn(lambda x: sp_rank(x[0],x[1]), (y_true, y_pred_rank), dtype=tf.float32)
    #Reduce to a single value
    loss = tf.reduce_mean(sp)
    return loss

答案 1 :(得分:0)

您可以做的一件事是使用Tensorflow的函数tf.py_functionscipy.stats.spearmanr并这样定义输入和输出:

from scipy.stats import spearmanr
def get_spearman_rankcor(y_true, y_pred):
     return ( tf.py_function(spearmanr, [tf.cast(y_pred, tf.float32), 
                       tf.cast(y_true, tf.float32)], Tout = tf.float32) )