Question

我正在尝试使用Keras API实现warp loss（成对排名函数的类型）。我有点担心这可以成功。

翘曲损失的定义取自lightFM doc。：

对于给定（用户，肯定项目对），从所有剩余项目中随机抽样负项目。计算两个项目的预测;如果否定项目的预测超过正项目的预测加上保证金，则执行梯度更新以将正项目排名更高，将负项目排名更低。如果没有排名违规，请继续对负面项目进行抽样，直到找到违规行为。

Warp功能用于例如{AI}研究中发表的semantic embeddings of #hashtags文章中。在本文中，他们试图预测短文本中最具代表性的主题标签。如果'user'被视为短文，则'positive item'是短文本的＃标签，而negative items是从'hashtag lookup'统一采样的随机主题标签。

我正在追随另一个三重奏损失的印象来创造经线：github

我的理解是，对于每个数据点，我将有3个输入。嵌入示例（'半'伪代码）：

sequence_input = Input(shape=(100, ), dtype='int32') # 100 features per data point
positive_example = Input(shape=(1, ), dtype='int32', name="positive") # the one positive example
negative_examples = Input(shape=(1000,), dtype='int32', name="random_negative_examples") # 1000 random negative examples.

#map data points to already created embeddings
embedded_seq_input = embedded_layer(sequence_input)
embedded_positive = embedded_layer(positive_example)
embedded_negatives = embedded_layer(negative_examples)

conv1 = Convolution1D(...)(embeddded_seq_input)
               .
               .
               .
z = Dense(vector_size_of_embedding,activation="linear")(convN)

loss = merge([z, embedded_positive, embedded_negatives],mode=warp_loss)
                         .
                         .
                         .

其中warp_loss是（我假设获得1000个随机否定而不是全部取消并且得分来自cosinus similatiry）：

def warp_loss(X):
    # pseudocode
    z, positive, negatives = X
    positive_score = cosinus_similatiry(z, positive)
    counts = 1
    loss = 0
    for negative in negatives:
        score = cosinus_similatiry(z, negative)
        if score > positive_score:
           loss = ((number_of_labels - 1) / counts) * (score + 1 - positive_score
        else:
           counts += 1
    return loss

很好地描述了如何计算扭曲：post

我不确定这是否是正确的方法，但我找不到实现warp_loss伪函数的方法。我可以使用merge([x,u],mode='cos')计算余弦，但这假定相同的维度。因此，我不确定如何将merge模式cos用于多个否定示例，因此我尝试创建自己的warp_loss。

任何见解，实施类似的例子，评论都很有用。

Answer 1

首先，我认为无法在批处理训练范式中实施WARP。因此，您无法在Keras中实现WARP。这是因为WARP本质上是顺序的，因此无法处理分解成批的数据，la Keras。我想如果您进行完全随机的批处理，则可以将其完成。

通常，对于WARP，您要包含<div ng-controller="View2Ctrl"> {{ country.name }} </div>的余量，但是正如在本文中一样，您可以将其视为超参数：

这优于其先前的BPR，因为它优化了前k个精度而不是平均精度。

在Keras中实施WARP损失

1 个答案: