我正在使用sklearn的一袋角色特征模型和一个自定义估算器:
我使用的是Mac,而Tensorflow几乎不使用我的2个CPU内核,而Keras则全部使用。
配置参数:
config_proto = tf.ConfigProto(allow_soft_placement=True, inter_op_parallelism_threads=8, intra_op_parallelism_threads=32, device_count = {'CPU':4})
config_estimator = tf.estimator.RunConfig(session_config=config_proto)
分类器:
classifier = tf.estimator.Estimator(model_fn=my_model_fn,\
params={'feature_columns':make_feat_cols(), 'hidden_units':[20,20,20], 'n_classes':2}, model_dir='checkpoints', config=config_estimator)
只是一个简单的密集神经网络。
def inputs_fn(X,y):
#converts pandas dataframe/series into a dict-like function
dataset = tf.estimator.inputs.pandas_input_fn(X,y, shuffle=True, batch_size=500, num_epochs=10, queue_capacity=200000)
return dataset
在线搜索,看来pandas_input_fn可能是速度变慢的原因,但我不确定。测试集是6000个功能x 30000个条目。
关于它的价值,这是我在进行训练后从python分析器获得的信息:
ncalls tottime percall cumtime percall filename:lineno(function)
2723/1 0.230 0.000 3752.153 3752.153 {built-in method builtins.exec}
1 0.005 0.005 3752.153 3752.153 Estimator_testing.py:1(<module>)
618 0.002 0.000 3473.290 5.620 /usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:550(run)
618 0.002 0.000 3473.288 5.620 /usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:1035(run)
618 0.007 0.000 3473.286 5.620 /usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:1117(run)
618 0.038 0.000 3473.279 5.620 /usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:1170(run)
1 0.000 0.000 3395.453 3395.453 /usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py:295(train)
1 0.000 0.000 3395.453 3395.453 /usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py:839(_train_model)
1 0.001 0.001 3395.453 3395.453 /usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py:845(_train_model_default)
651 0.020 0.000 3358.462 5.159 /usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py:790(run)
651 0.070 0.000 3358.173 5.158 /usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1046(_run)
651 0.007 0.000 3357.739 5.158 /usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1271(_do_run)
651 0.002 0.000 3357.682 5.158 /usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1320(_do_call)
651 0.003 0.000 3357.680 5.158 /usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1303(_run_fn)
651 0.003 0.000 3354.512 5.153 /usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1404(_call_tf_sessionrun)
最好的办法是-摆脱pandas_input_fn吗?
题外话:此自定义估算器在某种程度上可实现测试数据的95%准确性,而在keras中则为92%。但是,在预测时,只能预测一个类。这怎么可能?
谢谢!