尝试按照TensorFlow Linear Model Tutorial
的示例代码进行操作时不改变代码的结构,只更改列。 在运行时,我遇到错误“ValueError:功能与给定信息不兼容。”
鉴于特征:
'COLUMN_1': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x00000000094C2D30>
'COLUMN_2': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x000000000C4D4B38>
'COLUMN_3': <tf.Tensor 'Const_1:0' shape=(3,) dtype=float64>
必需的签名::
'COLUMN_1': TensorSignature(dtype=tf.string, shape=None, is_sparse=True)
'COLUMN_2': TensorSignature(dtype=tf.string, shape=None, is_sparse=True)
'COLUMN_3': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(3)]), is_sparse=False)
问题是测试数据和训练数据是由同一个函数处理的,输出结构应该是相同的。
我测试使用训练数据作为测试数据,它工作,没有错误。 那么测试集中的数据如何影响特征签名?
同样的错误被问到here
完整的错误消息
traceback (most recent call last):
File "C:\Users\USERA\Desktop\USERA\New Projects 2017\Machine Learning\Renewal\TensorFlowLinearModelRenewal.py", line 308, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "C:\Users\USERA\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "C:\Users\USERA\Desktop\USERA\New Projects 2017\Machine Learning\Renewal\TensorFlowLinearModelRenewal.py", line 270, in main
FLAGS.train_data, FLAGS.test_data)
File "C:\Users\USERA\Desktop\USERA\New Projects 2017\Machine Learning\Renewal\TensorFlowLinearModelRenewal.py", line 255, in train_and_eval
results = m.evaluate(input_fn=lambda: input_fn(df_test), steps=1)
File "C:\Users\USERA\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\util\deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "C:\Users\USERA\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\learn\python\learn\estimators\estimator.py", line 543, in evaluate
log_progress=log_progress)
File "C:\Users\USERA\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\learn\python\learn\estimators\estimator.py", line 827, in _evaluate_model
self._check_inputs(features, labels)
File "C:\Users\USERA\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\learn\python\learn\estimators\estimator.py", line 757, in _check_inputs
(str(features), str(self._features_info)))
ValueError: Features are incompatible with given information. Given features: {'POL_SRC_BUS_CODE': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x00000000152A2320>, 'POL_SUB_PRD_CODE': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x00000000120DE080>, 'ASSR_TYPE': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x00000000152A2518>, 'POL_PYMT_MODE': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x000000000E523FD0>, 'POL_PREM_RENEWAL': <tf.Tensor 'Const_2:0' shape=(12,) dtype=float64>, 'POL_JACKET_CODE': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x0000000015D4C7F0>, 'POL_AGE': <tf.Tensor 'Const:0' shape=(12,) dtype=int64>, 'ASSR_GENDER': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x000000000D92C2B0>, 'ACCOUNT_CLASS': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x000000000B40FE80>, 'POL_PREM_ORIGINAL': <tf.Tensor 'Const_1:0' shape=(12,) dtype=float64>, 'POL_SCHEME_PLAN': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x00000000152A2908>, 'POL_CUST_CODE': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x000000000BCF27F0>, 'CLAIM_INCURRED': <tf.Tensor 'Const_4:0' shape=(12,) dtype=float64>, 'POL_END_NO_IDX': <tf.Tensor 'Const_3:0' shape=(12,) dtype=int64>, 'ASSR_MAR_STATUS': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x00000000152A2208>, 'POL_OCC_DESC': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x00000000152A27F0>, 'ASSR_NATIONALITY': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x000000000DB4F588>}, required signatures: {'ASSR_GENDER': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'POL_PREM_RENEWAL': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(21)]), is_sparse=False), 'POL_PREM_ORIGINAL': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(21)]), is_sparse=False), 'POL_SUB_PRD_CODE': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'ASSR_NATIONALITY': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'POL_CUST_CODE': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'POL_SRC_BUS_CODE': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'ASSR_MAR_STATUS': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'CLAIM_INCURRED': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(21)]), is_sparse=False), 'POL_END_NO_IDX': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(21)]), is_sparse=False), 'POL_SCHEME_PLAN': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'ASSR_TYPE': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'POL_PYMT_MODE': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'ACCOUNT_CLASS': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'POL_JACKET_CODE': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'POL_OCC_DESC': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'POL_AGE': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(21)]), is_sparse=False)}.
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Example code for TensorFlow Wide & Deep Tutorial using TF.Learn API."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import sys
import tempfile
from six.moves import urllib
import pandas as pd
import tensorflow as tf
COLUMNS = ["EXPIRY_MONTH", "POL_SYS_ID","POL_DEPT_CODE","ACCOUNT_CLASS","POL_CUST_CODE","POL_SUB_PRD_CODE",
"POL_SRC_BUS_CODE","POL_OCC_DESC","POL_AGE","POL_INSURED_ID","ASSR_TYPE","ASSR_GENDER","ASSR_MAR_STATUS","ASSR_NATIONALITY",
"POL_SCHEME_PLAN","POL_PYMT_MODE","POL_POSTAL_CODE","POL_JACKET_CODE","POL_PREM_ORIGINAL","POL_PREM_RENEWAL","POL_END_NO_IDX",
"CLAIM_INCURRED","REVIEW_STATUS",]
LABEL_COLUMN = "label"
CATEGORICAL_COLUMNS = [ "ACCOUNT_CLASS", "POL_CUST_CODE", "POL_SUB_PRD_CODE",
"POL_SRC_BUS_CODE", "POL_OCC_DESC", "ASSR_TYPE", "ASSR_GENDER",
"ASSR_MAR_STATUS", "ASSR_NATIONALITY", "POL_SCHEME_PLAN", "POL_PYMT_MODE",
"POL_JACKET_CODE"
]
CONTINUOUS_COLUMNS = ["POL_AGE", "POL_PREM_ORIGINAL", "POL_PREM_RENEWAL","POL_END_NO_IDX","CLAIM_INCURRED"]
def maybe_download(train_data, test_data):
"""Maybe downloads training data and returns train and test file names."""
print('-----start of maybe_download')
if train_data:
train_file_name = train_data
else:
train_file = open('Renewal Listing 2015 export2.csv')
train_file_name = train_file.name
train_file.close()
print("------maybe_download()-----Training data is downloaded to %s" % train_file_name)
if test_data:
test_file_name = test_data
else:
test_file = open('Renewal Listing 2015 export2 Test.csv')
test_file_name = test_file.name
test_file.close()
print("------maybe_download()-----Test data is downloaded to %s" % test_file_name)
print('-----end of maybe_download')
return train_file_name, test_file_name
def build_estimator(model_dir, model_type):
"""Build an estimator."""
print('-----start of build_estimator')
# Sparse base columns.
ACCOUNT_CLASS = tf.contrib.layers.sparse_column_with_hash_bucket(
"ACCOUNT_CLASS", hash_bucket_size=1000)
POL_CUST_CODE = tf.contrib.layers.sparse_column_with_hash_bucket(
"POL_CUST_CODE", hash_bucket_size=100)
POL_SUB_PRD_CODE = tf.contrib.layers.sparse_column_with_hash_bucket(
"POL_SUB_PRD_CODE", hash_bucket_size=100)
POL_SRC_BUS_CODE = tf.contrib.layers.sparse_column_with_hash_bucket(
"POL_SRC_BUS_CODE", hash_bucket_size=1000)
POL_OCC_DESC = tf.contrib.layers.sparse_column_with_hash_bucket(
"POL_OCC_DESC", hash_bucket_size=1000)
ASSR_NATIONALITY = tf.contrib.layers.sparse_column_with_hash_bucket(
"ASSR_NATIONALITY", hash_bucket_size=1000)
POL_JACKET_CODE = tf.contrib.layers.sparse_column_with_hash_bucket(
"POL_JACKET_CODE", hash_bucket_size=1000)
print("----build_estimator()----hash_bucket_size: columns are processed")
ASSR_GENDER = tf.contrib.layers.sparse_column_with_keys(column_name="ASSR_GENDER",
keys=["M", "F"])
ASSR_TYPE = tf.contrib.layers.sparse_column_with_keys(column_name="ASSR_TYPE",
keys=["I", "C","M"])
ASSR_MAR_STATUS = tf.contrib.layers.sparse_column_with_keys(column_name="ASSR_MAR_STATUS",
keys=["D", "M","S","W"])
POL_SCHEME_PLAN = tf.contrib.layers.sparse_column_with_keys(column_name="POL_SCHEME_PLAN",
keys=["Y", "N"])
POL_PYMT_MODE = tf.contrib.layers.sparse_column_with_keys(column_name="POL_PYMT_MODE",
keys=["C", "CH","CR"])
print("----build_estimator()----sparse_column_with_keys: columns are processed")
# Continuous base columns.
POL_AGE = tf.contrib.layers.real_valued_column("POL_AGE")
POL_PREM_ORIGINAL = tf.contrib.layers.real_valued_column("POL_PREM_ORIGINAL")
POL_PREM_RENEWAL = tf.contrib.layers.real_valued_column("POL_PREM_RENEWAL")
POL_END_NO_IDX = tf.contrib.layers.real_valued_column("POL_END_NO_IDX")
CLAIM_INCURRED = tf.contrib.layers.real_valued_column("CLAIM_INCURRED")
print("----build_estimator()----Continuous base columns; are processed")
# Transformations.
age_buckets = tf.contrib.layers.bucketized_column(POL_AGE,
boundaries=[
18, 25, 30, 35, 40, 45,
50, 55, 60, 65
])
print("----build_estimator()----Transformations for age to boundaries is processed")
# Wide columns and deep columns.
wide_columns = [ACCOUNT_CLASS, POL_CUST_CODE, POL_SUB_PRD_CODE, POL_SRC_BUS_CODE, POL_OCC_DESC,
ASSR_NATIONALITY, POL_JACKET_CODE,ASSR_GENDER,ASSR_TYPE,ASSR_MAR_STATUS,POL_SCHEME_PLAN,POL_PYMT_MODE,
POL_PREM_ORIGINAL,POL_PREM_RENEWAL,POL_END_NO_IDX,CLAIM_INCURRED,
age_buckets,
tf.contrib.layers.crossed_column([POL_CUST_CODE, POL_SUB_PRD_CODE],
hash_bucket_size=int(1e4)),
tf.contrib.layers.crossed_column(
[age_buckets, POL_OCC_DESC],
hash_bucket_size=int(1e4)),
tf.contrib.layers.crossed_column([ACCOUNT_CLASS, ASSR_TYPE],
hash_bucket_size=int(1e4))]
deep_columns = [
tf.contrib.layers.embedding_column(POL_OCC_DESC, dimension=8),
tf.contrib.layers.embedding_column(POL_SRC_BUS_CODE, dimension=8),
tf.contrib.layers.embedding_column(ASSR_GENDER, dimension=8),
tf.contrib.layers.embedding_column(POL_JACKET_CODE, dimension=8),
tf.contrib.layers.embedding_column(ASSR_NATIONALITY,
dimension=8),
tf.contrib.layers.embedding_column(POL_PYMT_MODE, dimension=8),
POL_AGE,
POL_PREM_ORIGINAL,
POL_PREM_RENEWAL,
POL_END_NO_IDX,
CLAIM_INCURRED,
]
if model_type == "wide":
m = tf.contrib.learn.LinearClassifier(model_dir=model_dir,
feature_columns=wide_columns)
elif model_type == "deep":
m = tf.contrib.learn.DNNClassifier(model_dir=model_dir,
feature_columns=deep_columns,
hidden_units=[100, 50])
else:
m = tf.contrib.learn.DNNLinearCombinedClassifier(
model_dir=model_dir,
linear_feature_columns=wide_columns,
dnn_feature_columns=deep_columns,
dnn_hidden_units=[100, 50],
fix_global_step_increment_bug=True)
print("-----end of build_estimator")
return m
def input_fn(df):
"""Input builder function."""
# Creates a dictionary mapping from each continuous feature column name (k) to
# the values of that column stored in a constant Tensor.
print('---------------------------------------------')
print('-----start of input_fn')
print('-----input_fn()----print df')
#print(df)
print('-----input_fn()----print df end')
continuous_cols = {k: tf.constant(df[k].values) for k in CONTINUOUS_COLUMNS}
print('-----input_fn()----print continuous_cols')
#print(continuous_cols)
print('-----input_fn()----print continuous_cols end')
# Creates a dictionary mapping from each categorical feature column name (k)
# to the values of that column stored in a tf.SparseTensor.
categorical_cols = {
k: tf.SparseTensor(
indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
dense_shape=[df[k].size, 1])
for k in CATEGORICAL_COLUMNS}
print('-----input_fn()----print categorical_cols')
#print(categorical_cols)
print('-----input_fn()----print categorical_cols end')
# Merges the two dictionaries into one.
feature_cols = dict(continuous_cols)
feature_cols.update(categorical_cols)
print('-----input_fn()----print feature_cols')
#print(feature_cols)
print('-----input_fn()----end of print feature_cols')
# Converts the label column into a constant Tensor.
label = tf.constant(df[LABEL_COLUMN].values)
# Returns the feature columns and the label.
print('-----input_fn()----print label')
#print(label)
print('-----input_fn()----end of print label')
print('---------------------------------------------')
print('-----end of input_fn')
return feature_cols, label
def train_and_eval(model_dir, model_type, train_steps, train_data, test_data):
"""Train and evaluate the model."""
print('---------------------------------------------')
print('-----start of train_and_eval')
print("-----train_and_eval()-----start to get files")
train_file_name, test_file_name = maybe_download(train_data, test_data)
print("-----train_and_eval()-----got train and test data")
df_train = pd.read_csv(
tf.gfile.Open(train_file_name),
names=COLUMNS,
skipinitialspace=True,
engine="python")
df_test = pd.read_csv(
tf.gfile.Open(test_file_name),
names=COLUMNS,
skipinitialspace=True,
skiprows=1,
engine="python")
#print("----print df_train file")
#for row in df_train:
# print(row)
# remove NaN elements
df_train = df_train.dropna(how='any', axis=0)
df_test = df_test.dropna(how='any', axis=0)
df_train[LABEL_COLUMN] = (
df_train["REVIEW_STATUS"].apply(lambda x: "RENEWED" in x)).astype(int)
df_test[LABEL_COLUMN] = (
df_test["REVIEW_STATUS"].apply(lambda x: "RENEWED" in x)).astype(int)
model_dir = tempfile.mkdtemp() if not model_dir else model_dir
print("-----train_and_eval()-----model directory = %s" % model_dir)
m = build_estimator(model_dir, model_type)
print(m)
print('-----train_and_eval()-----finished build_estimator')
print('----------------------------------')
print('----------------------------------')
print('-----start of input_fn(df_train)-------------')
print('========== compare train and test')
print('train')
print(df_train.shape)
print(df_train['POL_JACKET_CODE'].shape)
print('test')
print(df_test.shape)
print(df_test)
print('=====================================')
m.fit(input_fn=lambda: input_fn(df_train), steps=train_steps)
print('-----end of input_fn(df_train)-------------')
print('-----beginning of evaluate input_fn(df_test)')
results = m.evaluate(input_fn=lambda: input_fn(df_test), steps=1)
print('-----train_and_eval()-----end of input_fn(df_test)')
print(' start to print -----results--------')
print(results)
for key in sorted(results):
print("%s: %s" % (key, results[key]))
print('---------------------------------------------')
print('-----end of train_and_eval')
FLAGS = None
def main(_):
print("-------main()----- program start")
train_and_eval(FLAGS.model_dir, FLAGS.model_type, FLAGS.train_steps,
FLAGS.train_data, FLAGS.test_data)
print("------main()----- program ended: ")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.register("type", "bool", lambda v: v.lower() == "true")
parser.add_argument(
"--model_dir",
type=str,
default="",
help="Base directory for output models."
)
parser.add_argument(
"--model_type",
type=str,
default="wide_n_deep",
help="Valid model types: {'wide', 'deep', 'wide_n_deep'}."
)
parser.add_argument(
"--train_steps",
type=int,
default=200,
help="Number of training steps."
)
parser.add_argument(
"--train_data",
type=str,
default="",
help="Path to the training data."
)
parser.add_argument(
"--test_data",
type=str,
default="",
help="Path to the test data."
)
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
答案 0 :(得分:0)
我发现了问题,当python读取CSV时,数据类型取决于数据本身。 对我来说,测试集中有一列所有数据= 0,但它是一个浮点数列。当程序读取测试集csv时,该列被视为int。因此它与训练数据中的浮点数类型不匹配,从而导致错误。将0更改为0.00或使用实际数据,其中0和浮点数都在列中解决了问题。