我正在尝试制作一个可以根据葡萄酒数据预测葡萄酒质量的模型。 我收到此错误:
<块引用>ValueError:特征酒精不在特征字典中。
但我运行了 print(feature_columns)
,这是输出:
[NumericColumn(key='fixed acidity', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='volatile acidity', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='citric acid', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='residual sugar', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='chlorides', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='free sulfur dioxide', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='total sulfur dioxide', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='sulphates', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='alcohool', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='quality', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None)]
alcohool
在那里 我不明白发生了什么。当我尝试训练我的模型时,错误发生在:linear_est.train(train_input_fn)
。
我的模型如下所示:
dftrain = pd.read_csv('winequality-red.csv').head(790)
dfeval = pd.read_csv('winequality-red.csv').tail(809)
y_train = dftrain.pop('quality')
y_eval = dfeval.pop('quality')
CATEGORICAL_COLUMNS = []
NUMERIC_COLUMNS = ['fixed acidity','volatile acidity','citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide','density','pH','sulphates','alcohool','quality']
feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
vocabulary = dftrain[feature_name].unique() #gets a lsit of all unique values from given feature column
feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype = tf.float64))
输入函数:
# INPUT FUNCTION
def make_input_fn(data_df, label_df, num_epochs=1000, shuffle=True, batch_size=32):
def input_function(): # inner function, this will be returned
ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # create tf.data.Dataset object with data and its label
if shuffle:
ds = ds.shuffle(1000) # randomize order of data
ds = ds.batch(batch_size).repeat(num_epochs) # split dataset into batches of 32 and repeat process for number of epochs
return ds # return a batch of the dataset
return input_function # return a function object for use
train_input_fn = make_input_fn(dftrain, y_train) # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
# We create a linear estimtor by passing the feature columns we created earlier
linear_est.train(train_input_fn) # train
result = linear_est.evaluate(eval_input_fn) # get model metrics/stats by testing on tetsing data
clear_output() # clears consoke output
print(result['accuracy']) # the result variable is simply a dict of stats about our model