是否可以将LambdaMART用于非文本排名场景?

时间:2017-04-20 10:48:35

标签: python machine-learning ranking

我有一个用户交互日志文件及其排名(0-5之间)。我使用LambdaMART方法(pyltr implimentation)来预测排名。数据集在svmlight格式中看起来如下。第一列是我要预测的排名,qid旁边的值是唯一的交互ID。

5 qid:0 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:528 10:848 11:3 
3 qid:1 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:609 10:848 11:2 
1 qid:2 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:488 10:848 11:1 
4 qid:3 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:480 10:848 11:1 
0 qid:4 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:590 10:848 12:119.33
3 qid:5 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:501 10:848 11:2 
2 qid:6 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:503 10:848 11:1 
3 qid:7 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:555 10:848 11:3 

我的脚本如下:

def ltr(dataPath):
    # http://svmlight.joachims.org/

    # 1) Import dataset
    with open(dataPath+'data_train.dat') as trainfile, \
            open(dataPath+'data_eval.dat') as valifile, \
            open(dataPath+'data_test.dat') as evalfile:
        TX, Ty, Tqids, _ = pyltr.data.letor.read_dataset(trainfile, one_indexed=False)
        VX, Vy, Vqids, _ = pyltr.data.letor.read_dataset(valifile, one_indexed=False)
        EX, Ey, Eqids, _ = pyltr.data.letor.read_dataset(evalfile, one_indexed=False)

    # 2) Train a LambdaMART model, using validation set for early stopping and trimming
    metric = pyltr.metrics.NDCG(k=5)

    # Only needed if you want to perform validation (early stopping & trimming)
    monitor = pyltr.models.monitors.ValidationMonitor(
        VX, Vy, Vqids, metric=metric, stop_after=250)

    model = pyltr.models.LambdaMART(
        metric=metric,
        n_estimators=1000,
        learning_rate=0.01,
        max_features=0.5,
        query_subsample=0.5,
        max_leaf_nodes=10,
        min_samples_leaf=64,
        verbose=1,
    )

    model.fit(TX, Ty, Tqids, monitor=monitor)

    # 3) Evaluate model on test data
    Epred = model.predict(EX)
    print 'Random ranking:', metric.calc_mean_random(Eqids, Ey)
    print 'Our model:', metric.calc_mean(Eqids, Ey, Epred)
    print 'Feature importance', model.feature_importances_

模型的性能与随机排名相同,并且要素重要性值为零。它与我制定问题的方式有关吗?

0 个答案:

没有答案