我正在尝试在数值和连续数据上训练基于张量流的随机森林回归。
当我尝试拟合我的估算器时,它会从下面的消息开始:
INFO:tensorflow:用params =
构建森林INFO:tensorflow:{' num_trees':10,' max_nodes':1000,' bagging_fraction':1.0,' feature_bagging_fraction': 1.0,' num_splits_to_consider':10,' max_fertile_nodes':0,' split_after_samples':250,' valid_leaf_threshold':1,' dominate_method&#39 ;:' bootstrap',' dominate_fraction':0.99,' model_name':' all_dense',' split_finish_name&#39 ;:'基本',' split_pruning_name':'无',' collate_examples':False,' checkpoint_stats':False ,' use_running_stats_method':False,' initialize_average_splits':False,' inference_tree_paths':False,' param_file':无,' split_name& #39;:' less_or_equal',' early_finish_check_every_samples':0,' prune_every_samples':0,' feature_columns':[_ NumericColumn(key =& #39; Average_Score',shape =(1,),default_value = None,dtype = tf.float32,normalizer_fn = None), _NumericColumn(key =' lat',shape =(1,),default_value = None,dtype = tf.float32,normalizer_fn = None),_ NomericColumn(key =' lng',shape = (1,),default_value = None,dtype = tf.float32,normalizer_fn = None)],' num_classes':1,' num_features':2,' regression&#39 ;:是的,' bagged_num_features':2,' bagged_features':无,' num_outputs':1,' num_output_columns':2,&# 39; base_random_seed':0,' leaf_model_type':2,' stats_model_type':2,' finish_type':0,' pruning_type' :0,' split_type':0}
然后该过程发生故障,我得到一个值错误:
ValueError:Shape必须至少为2级,但对于' concat' (op:' ConcatV2')具有输入形状:[?],[?],[?],[]和计算输入张量:input [3] =< 1>。
这是我正在使用的代码:
import tensorflow as tf
from tensorflow.contrib.tensor_forest.python import tensor_forest
from tensorflow.python.ops import resources
import pandas as pd
from tensorflow.contrib.tensor_forest.client import random_forest
from tensorflow.python.estimator.inputs import numpy_io
import numpy as np
def getFeatures():
Average_Score = tf.feature_column.numeric_column('Average_Score')
lat = tf.feature_column.numeric_column('lat')
lng = tf.feature_column.numeric_column('lng')
return [Average_Score,lat ,lng]
# Import hotel data
Hotel_Reviews=pd.read_csv("./DataMining/Hotel_Reviews.csv")
Hotel_Reviews_Filtered=Hotel_Reviews[(Hotel_Reviews.lat.notnull() |
Hotel_Reviews.lng.notnull())]
Hotel_Reviews_Filtered_Target = Hotel_Reviews_Filtered[["Reviewer_Score"]]
Hotel_Reviews_Filtered_Features = Hotel_Reviews_Filtered[["Average_Score","lat","lng"]]
#Preprocess the data
x=Hotel_Reviews_Filtered_Features.to_dict('list')
for key in x:
x[key] = np.array(x[key])
y=Hotel_Reviews_Filtered_Target.values
#specify params
params = tf.contrib.tensor_forest.python.tensor_forest.ForestHParams(
feature_colums= getFeatures(),
num_classes=1,
num_features=2,
regression=True,
num_trees=10,
max_nodes=1000)
#build the graph
graph_builder_class = tensor_forest.RandomForestGraphs
est=random_forest.TensorForestEstimator(
params, graph_builder_class=graph_builder_class)
#define input function
train_input_fn = numpy_io.numpy_input_fn(
x=x,
y=y,
batch_size=1000,
num_epochs=1,
shuffle=True)
est.fit(input_fn=train_input_fn, steps=500)
变量x是形状为(512470,)的numpy数组列表:
{'Average_Score': array([ 7.7, 7.7, 7.7, ..., 8.1, 8.1, 8.1]),
'lat': array([ 52.3605759, 52.3605759, 52.3605759, ..., 48.2037451,
48.2037451, 48.2037451]),
'lng': array([ 4.9159683, 4.9159683, 4.9159683, ..., 16.3356767,
16.3356767, 16.3356767])}
变量y是形状
array([[ 2.9],
[ 7.5],
[ 7.1],
...,
[ 2.5],
[ 8.8],
[ 8.3]])
答案 0 :(得分:0)
使用ndmin = 2将x中的每个数组强制为2暗。然后形状应该匹配并且concat应该能够操作。