Question

我正在尝试从示例中对我的数据集运行一个简单的boostedTreeClassifier，但是它似乎卡在了第一步上：

2019-06-28 11:20:31.658689: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:111] Filling up shuffle buffer (this may take a while): 84090 of 85873
2019-06-28 11:20:32.908425: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:162] Shuffle buffer filled.
I0628 11:20:34.904214 140220602029888 basic_session_run_hooks.py:262] loss = 0.6931464, step = 0
W0628 11:21:03.421219 140220602029888 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W0628 11:21:05.555618 140220602029888 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.

当我将其传递给其他基于keras的模型或xgboost模型时，相同的数据集似乎可以正常工作。以下是相关代码：

def make_input_fn(self, X, y, shuffle=True, num_epochs=None):
  num_samples = len(self.y_train)
  def input_fn():
    dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
    if shuffle:
      dataset = dataset.shuffle(num_samples).repeat(num_epochs).batch(self.batch_size)
    else:
      dataset = dataset.repeat(num_epochs).batch(self.batch_size)
    return dataset
  return input_fn

def ens_train(self):

    tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.DEBUG)

    train_input_fn = self.make_input_fn(self.X_train, self.y_train, num_epochs=self.epochs)

    self.model = tf.estimator.BoostedTreesClassifier(self.feature_columns,
                                                     n_batches_per_layer = int(0.5* len(self.y_train)/self.batch_size),
                                                     model_dir = self.ofolder,
                                                     max_depth = 10,
                                                     n_trees = 1000)
    self.model.train(train_input_fn, max_steps = 1000)

Answer 1

能够通过学习率和时期数来获得结果。通过在xgboost上进行超参数调整获得的“最佳”参数在BoostedTreeClassifier中没有得到相似的结果。花了大量的时间才能达到84％左右的精度（平衡数据集）。 xgboost的95％都没有进行超参数调整。

BoostedTreeClassifier第一步会陷入损失

1 个答案: