Question

我正在学习推荐系统。我曾经使用过Tensorflow的随机森林。我的损失结果有问题。如何修复我的代码。救救我。

这是x_data的值
形状=（6000,116）
值为0或1

array([[1, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 1, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 1, 1, 0],
       [0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 1]])

这是y_data的值
形状=（6000,1）
值为0或1

array([[0],
       [0],
       [1],
       ...,
       [0],
       [0],
       [0]])

这是我的代码

def next_batch(x_data, y_data, batch_size):
    if (len(x_data) != len(y_data)):
        return None, None

    batch_mask = np.random.choice(len(x_data), batch_size)
    x_batch = x_data[batch_mask]
    y_batch = y_data[batch_mask]
    return x_batch, y_batch

x_train = train.iloc[:, 3:].values
y_train = train.iloc[:,2:3].values
x_test = test.iloc[:,2:].values

x_data = np.array(x_train, dtype=np.float32)
y_data = np.array(y_train, dtype=np.int64)
test_data = np.array(x_test, dtype=np.float32)

# Parameters
num_steps = 500 
batch_size = 1024
num_classes = 2 
num_features = 116
num_trees = 10
max_nodes = 1000

tf.reset_default_graph()

# Input and Target placeholders
X = tf.placeholder(tf.float32, shape=[None, num_features])
Y = tf.placeholder(tf.int64, shape=[None,1])

# Random Forest Parameters
hparams = tensor_forest.ForestHParams(num_classes=num_classes,
                                      num_features=num_features,
                                      num_trees=num_trees,
                                      max_nodes=max_nodes).fill()


#Build the Random Forest
forest_graph = tensor_forest.RandomForestGraphs(hparams)

# Get training graph and loss
train_op = forest_graph.training_graph(X, Y)
loss_op = forest_graph.training_loss(X,Y)

# Measure the accuracy
infer_op, _, _ = forest_graph.inference_graph(X)
correct_prediction = tf.equal(tf.argmax(infer_op, 1), tf.cast(Y, tf.int64))
accuracy_op = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init_vars = tf.group(tf.global_variables_initializer(), resources.initialize_resources(resources.shared_resources()))

sess = tf.Session()
sess.run(init_vars)

# Training
for i in range(1, num_steps + 1):
    # Prepare Data
    # Get the next batch of MNIST data (only images are needed, not labels)
    batch_x, batch_y = next_batch(x_data, y_data, batch_size)
    _, l = sess.run([train_op, loss_op], feed_dict={X: batch_x, Y: batch_y})
    if i % 50 == 0 or i == 1:
        acc = sess.run(accuracy_op, feed_dict={X: batch_x, Y: batch_y})
        print('Step %i, Loss: %f, Acc: %f' % (i, l, acc))

为什么我的损失函数返回负值？
结果

INFO:tensorflow:Constructing forest with params = 
INFO:tensorflow:{'num_trees': 10, 'max_nodes': 1000, 'bagging_fraction': 1.0, 'feature_bagging_fraction': 1.0, 'num_splits_to_consider': 10, 'max_fertile_nodes': 0, 'split_after_samples': 250, 'valid_leaf_threshold': 1, 'dominate_method': 'bootstrap', 'dominate_fraction': 0.99, 'model_name': 'all_dense', 'split_finish_name': 'basic', 'split_pruning_name': 'none', 'collate_examples': False, 'checkpoint_stats': False, 'use_running_stats_method': False, 'initialize_average_splits': False, 'inference_tree_paths': False, 'param_file': None, 'split_name': 'less_or_equal', 'early_finish_check_every_samples': 0, 'prune_every_samples': 0, 'num_classes': 2, 'num_features': 116, 'bagged_num_features': 116, 'bagged_features': None, 'regression': False, 'num_outputs': 1, 'num_output_columns': 3, 'base_random_seed': 0, 'leaf_model_type': 0, 'stats_model_type': 0, 'finish_type': 0, 'pruning_type': 0, 'split_type': 0}
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/tensor_forest/python/tensor_forest.py:529: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Step 1, Loss: -1.000000, Acc: 0.873047
Step 50, Loss: -250.399994, Acc: 0.833313
Step 100, Loss: -537.200012, Acc: 0.856388
Step 150, Loss: -822.799988, Acc: 0.841568
Step 200, Loss: -1001.000000, Acc: 0.835522
Step 250, Loss: -1001.000000, Acc: 0.839737
Step 300, Loss: -1001.000000, Acc: 0.817566
Step 350, Loss: -1001.000000, Acc: 0.816372
Step 400, Loss: -1001.000000, Acc: 0.843414
Step 450, Loss: -1001.000000, Acc: 0.829651
Step 500, Loss: -1001.000000, Acc: 0.839970

Answer 1

损耗只是您要尽量减少的标量。这不应该是积极的。

您在损失中获得负值的原因之一是因为 training_loss 中的 RandomForestGraphs 是使用交叉熵损失或{ {3}}（根据参考代码negative log liklihood）。

此外，如您所见，损耗在以后的迭代中保持恒定，我想进行 Hyperparameter Tuning （超参数调整）将使树对数据的变化具有鲁棒性。您可以参考here的一些想法。

为什么我的损失函数返回负值？

1 个答案: