我试图在TensorForestEstimator
模型中使用表示7个特征和7个标签的数值浮点数据。也就是说,features
和labels
的形状都是(484876, 7)
。我在num_classes=7
中适当地设置了num_features=7
和ForestHParams
。数据格式如下:
f1 f2 f3 f4 f5 f6 f7 l1 l2 l3 l4 l5 l6 l7
39000.0 120.0 65.0 1000.0 25.0 0.69 3.94 39000.0 39959.0 42099.0 46153.0 49969.0 54127.0 55911.0
32000.0 185.0 65.0 1000.0 75.0 0.46 2.19 32000.0 37813.0 43074.0 48528.0 54273.0 60885.0 63810.0
30000.0 185.0 65.0 1000.0 25.0 0.41 1.80 30000.0 32481.0 35409.0 39145.0 42750.0 46678.0 48595.0
当使用以下消息调用fit()
Python崩溃时:
Python在使用_pywrap_tensorflow_internal.so插件时意外退出。
以下是启用tf.logging.set_verbosity('INFO')
时的输出:
INFO:tensorflow:training graph for tree: 0
INFO:tensorflow:training graph for tree: 1
...
INFO:tensorflow:training graph for tree: 9998
INFO:tensorflow:training graph for tree: 9999
INFO:tensorflow:Create CheckpointSaverHook.
2017-07-26 10:25:30.908894: F tensorflow/contrib/tensor_forest/kernels/count_extremely_random_stats_op.cc:404]
Check failed: column < num_classes_ (39001 vs. 8)
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
我不确定这个错误意味着什么,它从num_classes=7
开始没有意义,而不是8,因为功能和标签的形状是(484876, 7)
,我不知道#39;不知道39001的来源。
以下是重现的代码:
import numpy as np
import pandas as pd
import os
def get_training_data():
training_file = "data.txt"
data = pd.read_csv(training_file, sep='\t')
X = np.array(data.drop('Result', axis=1), dtype=np.float32)
y = []
for e in data.ResultStr:
y.append(list(np.array(str(e).replace('[', '').replace(']', '').split(','))))
y = np.array(y, dtype=np.float32)
features = tf.constant(X)
labels = tf.constant(y)
return features, labels
hyperparameters = ForestHParams(
num_trees=100,
max_nodes=10000,
bagging_fraction=1.0,
num_splits_to_consider=0,
feature_bagging_fraction=1.0,
max_fertile_nodes=0,
split_after_samples=250,
min_split_samples=5,
valid_leaf_threshold=1,
dominate_method='bootstrap',
dominate_fraction=0.99,
# All parameters above are default
num_classes=7,
num_features=7
)
estimator = TensorForestEstimator(
params=hyperparameters,
# All parameters below are default
device_assigner=None,
model_dir=None,
graph_builder_class=RandomForestGraphs,
config=None,
weights_name=None,
keys_name=None,
feature_engineering_fn=None,
early_stopping_rounds=100,
num_trainers=1,
trainer_id=0,
report_feature_importances=False,
local_eval=False
)
estimator.fit(
input_fn=lambda: get_training_data(),
max_steps=100,
monitors=[
TensorForestLossHook(
early_stopping_rounds=30
)
]
)
如果我用SKCompat
包装它也不起作用,会发生同样的错误。造成这次事故的原因是什么?
答案 0 :(得分:3)
regression=True
需要在ForestHParams
中指定,因为TensorForestEstimator
默认情况下假定它用于解决分类问题,该问题只能输出一个值。
在估计器初始化时创建了隐式num_outputs
变量,如果未指定1
,则设置为regression
。如果指定了regression
,则num_outputs = num_classes
和检查点会正常保存。