R中的xgboost预测在稀疏和密集矩阵上是不同的

时间:2017-10-09 09:53:21

标签: r machine-learning sparse-matrix xgboost

我使用sparse上的sparse.model.matrix库在sparse.model.matrix生成的矩阵上训练了一个简单模型,然后我对两个验证数据集进行了预测 - 一个由Matrix创建来自model.matrixstats来自x1。令我惊讶的是,结果差异很大。稀疏和密集矩阵具有相同的维度,所有数据都是数字的,没有缺失值。

这两组的平均预测如下:

  • 密集验证矩阵:0.5009256
  • 稀疏验证矩阵:0.4988821

是功能还是错误?

更新

我注意到当所有值为正xor为负时,不会发生错误。如果变量x1=sample(1:7, 2000, replace=T)具有定义require(Matrix) require(xgboost) valid <- data.frame(y=sample(0:1, 2000, replace=T), x1=sample(-1:5, 2000, replace=T), x2=runif(2000)) train <- data.frame(y=sample(0:1, 10000, replace=T), x1=sample(-1:5, 10000, replace=T), x2=runif(10000)) sparse_train_matrix <- sparse.model.matrix(~ ., data=train[, c("x1", "x2")]) d_sparse_train_matrix <- xgb.DMatrix(sparse_train_matrix, label = train$y) sparse_valid_matrix <- sparse.model.matrix(~ ., data=valid[, c("x1", "x2")]) d_sparse_valid_matrix <- xgb.DMatrix(sparse_valid_matrix, label = valid$y) valid_matrix <- model.matrix(~ ., data=valid[, c("x1", "x2")]) d_valid_matrix <- xgb.DMatrix(valid_matrix, label = valid$y) params = list(objective = "binary:logistic", seed = 99, eval_metric = "auc") sparse_w <- list(train=d_sparse_train_matrix, test=d_sparse_valid_matrix) set.seed(1) sprase_fit_xgb <- xgb.train(data=d_sparse_train_matrix, watchlist=sparse_w, params=params, nrounds=100) p1 <- predict(sprase_fit_xgb, newdata=d_valid_matrix, type="response") p2 <- predict(sprase_fit_xgb, newdata=d_sparse_valid_matrix, type="response") mean(p1); mean(p2) ,则两种情况下的平均预测都相同。

R中的代码:

R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C 
[5] LC_TIME=Polish_Poland.1250

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] xgboost_0.6-4 Matrix_1.2-10 data.table_1.10.4 dplyr_0.7.1

loaded via a namespace (and not attached): [1] Rcpp_0.12.11 lattice_0.20-35 assertthat_0.2.0 grid_3.4.1 
[5] R6_2.2.2 magrittr_1.5 stringi_1.1.5 rlang_0.1.1 
[9] bindrcpp_0.2 tools_3.4.1 glue_1.1.1 compiler_3.4.1 
[13] pkgconfig_2.0.1 bindr_0.1 tibble_1.3.3

我的sessionInfo:

// POST Data Works :
    $scope.submit = function(signinfo) {
    $http({
        url :'http://localhost:3000/loginfo',
        method : 'POST',
        data : signinfo
    })
    .then(
        function successCallback(result){
            response = {success: true, data: result}; 
        toastr.success('Successfully Updated');
        toastr.refreshTimer(toastr, 3000);

        $scope.reload();
    },
        function errorCallback(response){
            console.log("Error : " + response.data);
        });
    };


// GET Data Does Not Works : 
$http.get('http://localhost:3000/info')
.then(
    function successCallback(response){
        $scope.signinfo = response.data.info;
    },
    function errorCallback(response){
        console.log("Error : " + response.data)
    });

1 个答案:

答案 0 :(得分:2)

我发现herehere这是预期的行为并对我有意义。