我正在学习推荐系统。 我曾经使用过Tensorflow的随机森林。 我的损失结果有问题。 如何修复我的代码。 救救我。
这是x_data的值
形状=(6000,116)
值为0或1
array([[1, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 1, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 1, 1, 0],
[0, 0, 0, ..., 0, 0, 1],
[0, 0, 0, ..., 0, 0, 1]])
这是y_data的值
形状=(6000,1)
值为0或1
array([[0],
[0],
[1],
...,
[0],
[0],
[0]])
这是我的代码
def next_batch(x_data, y_data, batch_size):
if (len(x_data) != len(y_data)):
return None, None
batch_mask = np.random.choice(len(x_data), batch_size)
x_batch = x_data[batch_mask]
y_batch = y_data[batch_mask]
return x_batch, y_batch
x_train = train.iloc[:, 3:].values
y_train = train.iloc[:,2:3].values
x_test = test.iloc[:,2:].values
x_data = np.array(x_train, dtype=np.float32)
y_data = np.array(y_train, dtype=np.int64)
test_data = np.array(x_test, dtype=np.float32)
# Parameters
num_steps = 500
batch_size = 1024
num_classes = 2
num_features = 116
num_trees = 10
max_nodes = 1000
tf.reset_default_graph()
# Input and Target placeholders
X = tf.placeholder(tf.float32, shape=[None, num_features])
Y = tf.placeholder(tf.int64, shape=[None,1])
# Random Forest Parameters
hparams = tensor_forest.ForestHParams(num_classes=num_classes,
num_features=num_features,
num_trees=num_trees,
max_nodes=max_nodes).fill()
#Build the Random Forest
forest_graph = tensor_forest.RandomForestGraphs(hparams)
# Get training graph and loss
train_op = forest_graph.training_graph(X, Y)
loss_op = forest_graph.training_loss(X,Y)
# Measure the accuracy
infer_op, _, _ = forest_graph.inference_graph(X)
correct_prediction = tf.equal(tf.argmax(infer_op, 1), tf.cast(Y, tf.int64))
accuracy_op = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init_vars = tf.group(tf.global_variables_initializer(), resources.initialize_resources(resources.shared_resources()))
sess = tf.Session()
sess.run(init_vars)
# Training
for i in range(1, num_steps + 1):
# Prepare Data
# Get the next batch of MNIST data (only images are needed, not labels)
batch_x, batch_y = next_batch(x_data, y_data, batch_size)
_, l = sess.run([train_op, loss_op], feed_dict={X: batch_x, Y: batch_y})
if i % 50 == 0 or i == 1:
acc = sess.run(accuracy_op, feed_dict={X: batch_x, Y: batch_y})
print('Step %i, Loss: %f, Acc: %f' % (i, l, acc))
为什么我的损失函数返回负值?
结果
INFO:tensorflow:Constructing forest with params =
INFO:tensorflow:{'num_trees': 10, 'max_nodes': 1000, 'bagging_fraction': 1.0, 'feature_bagging_fraction': 1.0, 'num_splits_to_consider': 10, 'max_fertile_nodes': 0, 'split_after_samples': 250, 'valid_leaf_threshold': 1, 'dominate_method': 'bootstrap', 'dominate_fraction': 0.99, 'model_name': 'all_dense', 'split_finish_name': 'basic', 'split_pruning_name': 'none', 'collate_examples': False, 'checkpoint_stats': False, 'use_running_stats_method': False, 'initialize_average_splits': False, 'inference_tree_paths': False, 'param_file': None, 'split_name': 'less_or_equal', 'early_finish_check_every_samples': 0, 'prune_every_samples': 0, 'num_classes': 2, 'num_features': 116, 'bagged_num_features': 116, 'bagged_features': None, 'regression': False, 'num_outputs': 1, 'num_output_columns': 3, 'base_random_seed': 0, 'leaf_model_type': 0, 'stats_model_type': 0, 'finish_type': 0, 'pruning_type': 0, 'split_type': 0}
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/tensor_forest/python/tensor_forest.py:529: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Step 1, Loss: -1.000000, Acc: 0.873047
Step 50, Loss: -250.399994, Acc: 0.833313
Step 100, Loss: -537.200012, Acc: 0.856388
Step 150, Loss: -822.799988, Acc: 0.841568
Step 200, Loss: -1001.000000, Acc: 0.835522
Step 250, Loss: -1001.000000, Acc: 0.839737
Step 300, Loss: -1001.000000, Acc: 0.817566
Step 350, Loss: -1001.000000, Acc: 0.816372
Step 400, Loss: -1001.000000, Acc: 0.843414
Step 450, Loss: -1001.000000, Acc: 0.829651
Step 500, Loss: -1001.000000, Acc: 0.839970
答案 0 :(得分:0)
损耗只是您要尽量减少的标量。这不应该是积极的。
您在损失中获得负值的原因之一是因为 training_loss
中的 RandomForestGraphs
是使用交叉熵损失或{ {3}}(根据参考代码negative log liklihood)。
此外,如您所见,损耗在以后的迭代中保持恒定,我想进行 Hyperparameter Tuning (超参数调整)将使树对数据的变化具有鲁棒性。您可以参考here的一些想法。