我有一个用户交互日志文件及其排名(0-5之间)。我使用LambdaMART方法(pyltr implimentation)来预测排名。数据集在svmlight格式中看起来如下。第一列是我要预测的排名,qid旁边的值是唯一的交互ID。
5 qid:0 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:528 10:848 11:3
3 qid:1 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:609 10:848 11:2
1 qid:2 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:488 10:848 11:1
4 qid:3 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:480 10:848 11:1
0 qid:4 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:590 10:848 12:119.33
3 qid:5 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:501 10:848 11:2
2 qid:6 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:503 10:848 11:1
3 qid:7 0:8 1:1 3:13 4:54 5:131 6:54 7:7 8:53 9:555 10:848 11:3
我的脚本如下:
def ltr(dataPath):
# http://svmlight.joachims.org/
# 1) Import dataset
with open(dataPath+'data_train.dat') as trainfile, \
open(dataPath+'data_eval.dat') as valifile, \
open(dataPath+'data_test.dat') as evalfile:
TX, Ty, Tqids, _ = pyltr.data.letor.read_dataset(trainfile, one_indexed=False)
VX, Vy, Vqids, _ = pyltr.data.letor.read_dataset(valifile, one_indexed=False)
EX, Ey, Eqids, _ = pyltr.data.letor.read_dataset(evalfile, one_indexed=False)
# 2) Train a LambdaMART model, using validation set for early stopping and trimming
metric = pyltr.metrics.NDCG(k=5)
# Only needed if you want to perform validation (early stopping & trimming)
monitor = pyltr.models.monitors.ValidationMonitor(
VX, Vy, Vqids, metric=metric, stop_after=250)
model = pyltr.models.LambdaMART(
metric=metric,
n_estimators=1000,
learning_rate=0.01,
max_features=0.5,
query_subsample=0.5,
max_leaf_nodes=10,
min_samples_leaf=64,
verbose=1,
)
model.fit(TX, Ty, Tqids, monitor=monitor)
# 3) Evaluate model on test data
Epred = model.predict(EX)
print 'Random ranking:', metric.calc_mean_random(Eqids, Ey)
print 'Our model:', metric.calc_mean(Eqids, Ey, Epred)
print 'Feature importance', model.feature_importances_
模型的性能与随机排名相同,并且要素重要性值为零。它与我制定问题的方式有关吗?