Question

伙计们我是深度学习的新手。我正在US-Adult Income dataset

上培训DNN

我哪里出错了？另一个问题我想在不同的数据集上测试我的模型我该如何实际做到这一点？

这是我的代码：

import pandas as pd
input_data = pd.read_csv('adult.data.csv')


def label_fix(label):
    if label == '<=50K':
        return 0
    else:
       return 1

input_data['Income'] = input_data['Income'].apply(label_fix)

from sklearn.model_selection import train_test_split

x_data = input_data.drop('Income',axis = 1)
y_labels = input_data['Income']
X_train,X_test,y_train,y_test = train_test_split(x_data,y_labels,test_size= 0.3,random_state=101)


import tensorflow as tf



Age = tf.feature_column.numeric_column('Age')
Job_class = tf.feature_column.categorical_column_with_hash_bucket('Job-
Class',hash_bucket_size=1000)
fnlwgt = tf.feature_column.numeric_column('fnlwgt')
Education = tf.feature_column.categorical_column_with_hash_bucket('Education',hash_bucket_size=1000)
Education_num = tf.feature_column.numeric_column('Education-num')
Status = tf.feature_column.categorical_column_with_hash_bucket('Status',hash_bucket_size=1000)
Designation = tf.feature_column.categorical_column_with_hash_bucket('Designation',hash_bucket_size=1000)
Marital = tf.feature_column.categorical_column_with_hash_bucket('Marital',hash_bucket_size=1000)
Colour = tf.feature_column.categorical_column_with_vocabulary_list('Colour',['White', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo', 'Other', 'Black'])
Gender = tf.feature_column.categorical_column_with_vocabulary_list('Gender',['Male','Female'])
Capital_gain = tf.feature_column.numeric_column('capital-gain')
Capital_loss = tf.feature_column.numeric_column('capital-loss')
Hours = tf.feature_column.numeric_column('hours-per-week')
Native_country = tf.feature_column.categorical_column_with_hash_bucket('Native-Country',hash_bucket_size=1000)
Income = tf.feature_column.numeric_column('Income')



feats_cols = [Age,Job_class,fnlwgt,Education,Education_num,Status,Designation,Marital,Colour,Gender,Capital_gain,Capital_loss,Hours,Native_country]

model = tf.estimator.LinearClassifier(feature_columns=feats_cols)

input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,batch_size=100,num_epochs=None,shuffle=True)

model.train(input_fn=input_func,steps = 5000)

INFO：tensorflow：创建CheckpointSaverHook。 INFO：tensorflow：将1的检查点保存到C：\ Users \ micha \ AppData \ Local \ Temp \ tmpj2usekuf \ model.ckpt。 INFO：tensorflow：loss = 69.31474，step = 1 信息：tensorflow：global_step / sec：149.21 INFO：tensorflow：loss = 0.0，step = 101（0.676 sec）信息：tensorflow：global_step / sec：189.379 INFO：tensorflow：loss = 0.0，step = 201（0.528 sec）信息：tensorflow：global_step / sec：179.441 信息：张量流：损失= 0.0，步长= 301（0.551秒）信息：tensorflow：global_step / sec：170.941 INFO：tensorflow：loss = 0.0，step = 401（0.585 sec）信息：tensorflow：global_step / sec：176.699 INFO：tensorflow：loss = 0.0，step = 501（0.574 sec）信息：tensorflow：global_step / sec：196.918 信息：tensorflow：loss = 0.0，step = 601（0.505 sec）信息：tensorflow：global_step / sec：186.552 INFO：tensorflow：loss = 0.0，step = 701（0.536 sec）信息：tensorflow：global_step / sec：195.329 信息：tensorflow：loss = 0.0，step = 801（0.515 sec）信息：tensorflow：global_step / sec：174.856 信息：tensorflow：loss = 0.0，step = 901（0.569 sec）信息：tensorflow：global_step / sec：176.354 信息：tensorflow：loss = 0.0，step = 1001（0.562 sec）信息：tensorflow：global_step / sec：168.888 INFO：tensorflow：loss = 0.0，step = 1101（0.592 sec）信息：tensorflow：global_step / sec：171.54 INFO：tensorflow：loss = 0.0，step = 1201（0.600 sec）信息：tensorflow：global_step / sec：171.716 INFO：tensorflow：loss = 0.0，step = 1301（0.573 sec）信息：tensorflow：global_step / sec：178.132 INFO：tensorflow：loss = 0.0，step = 1401（0.558 sec）信息：tensorflow：global_step / sec：180.651 信息：张量流：损失= 0.0，步长= 1501（0.549秒）信息：tensorflow：global_step / sec：175.073 INFO：tensorflow：loss = 0.0，step = 1601（0.580 sec）信息：tensorflow：global_step / sec：177.171 INFO：tensorflow：loss = 0.0，step = 1701（0.556 sec）信息：tensorflow：global_step / sec：173.214 INFO：tensorflow：loss = 0.0，step = 1801（0.594 sec）信息：tensorflow：global_step / sec：165.829 INFO：tensorflow：loss = 0.0，step = 1901（0.586 sec）信息：tensorflow：global_step / sec：175.255 INFO：tensorflow：loss = 0.0，step = 2001（0.571 sec）信息：tensorflow：global_step / sec：171.048 INFO：tensorflow：loss = 0.0，step = 2101（0.593 sec）信息：tensorflow：global_step / sec：181.424 信息：tensorflow：loss = 0.0，step = 2201（0.548 sec）信息：tensorflow：global_step / sec：175.714 信息：tensorflow：loss = 0.0，step = 2301（0.569 sec）信息：tensorflow：global_step / sec：166.801 INFO：tensorflow：loss = 0.0，step = 2401（0.594 sec）信息：tensorflow：global_step / sec：173.364 信息：tensorflow：损失= 0.0，步长= 2501（0.580秒）信息：tensorflow：global_step / sec：169.802 信息：tensorflow：loss = 0.0，step = 2601（0.587 sec）信息：tensorflow：global_step / sec：175.314 INFO：tensorflow：loss = 0.0，step = 2701（0.569 sec）信息：tensorflow：global_step / sec：172.503 INFO：tensorflow：loss = 0.0，step = 2801（0.585 sec）信息：tensorflow：global_step / sec：184.231 INFO：tensorflow：loss = 0.0，step = 2901（0.545 sec）信息：tensorflow：global_step / sec：184.926 INFO：tensorflow：loss = 0.0，step = 3001（0.537 sec）信息：tensorflow：global_step / sec：189.303 INFO：tensorflow：loss = 0.0，step = 3101（0.526 sec）信息：tensorflow：global_step / sec：188.679 信息：tensorflow：loss = 0.0，step = 3201（0.536 sec）信息：tensorflow：global_step / sec：184.756 INFO：tensorflow：loss = 0.0，step = 3301（0.552 sec）信息：tensorflow：global_step / sec：184.09 INFO：tensorflow：loss = 0.0，step = 3401（0.534 sec）信息：tensorflow：global_step / sec：176.366 INFO：tensorflow：loss = 0.0，step = 3501（0.559 sec）信息：tensorflow：global_step / sec：178.401 INFO：tensorflow：loss = 0.0，step = 3601（0.567 sec）信息：tensorflow：global_step / sec：192.295 INFO：tensorflow：loss = 0.0，step = 3701（0.523 sec）信息：tensorflow：global_step / sec：190.446 INFO：tensorflow：loss = 0.0，step = 3801（0.526 sec）信息：tensorflow：global_step / sec：181.776 INFO：tensorflow：loss = 0.0，step = 3901（0.546 sec）信息：tensorflow：global_step / sec：174.088 INFO：tensorflow：loss = 0.0，step = 4001（0.577 sec）信息：tensorflow：global_step / sec：182.692 信息：tensorflow：loss = 0.0，step = 4101（0.546 sec）信息：tensorflow：global_step / sec：189.383 信息：tensorflow：loss = 0.0，step = 4201（0.526 sec）信息：tensorflow：global_step / sec：183.433 信息：tensorflow：loss = 0.0，step = 4301（0.556 sec）信息：tensorflow：global_step / sec：169.08 信息：tensorflow：loss = 0.0，step = 4401（0.576 sec）信息：tensorflow：global_step / sec：170.028 INFO：tensorflow：loss = 0.0，step = 4501（0.594 sec）信息：tensorflow：global_step / sec：173.793 信息：tensorflow：loss = 0.0，step = 4601（0.574 sec）信息：tensorflow：global_step / sec：177.173 信息：tensorflow：loss = 0.0，step = 4701（0.561 sec）信息：tensorflow：global_step / sec：172.853 信息：tensorflow：loss = 0.0，step = 4801（0.583 sec）信息：tensorflow：global_step / sec：179.073 INFO：tensorflow：loss = 0.0，step = 4901（0.561 sec）信息：tensorflow：将5000的检查点保存到C：\ Users \ micha \ AppData \ Local \ Temp \ tmpj2usekuf \ model.ckpt。信息：张量流：最后一步的损失：0.0。出[127]：

pred_fn = tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=len(X_test),shuffle=False)

predictions = list(model.predict(input_fn=pred_fn))

final_preds=[]
for pred in predictions:
  final_preds.append(pred['class_ids'][0])

from sklearn.metrics import classification_report

print(classification_report(y_test,final_preds))

         precision    recall  f1-score   support

     1       1.00      1.00      1.00      9769

平均/总计1.00 1.00 1.00 9769

Answer 1

您的方法label_fix中存在错误。由于<=50K值始终以空格（<=50K）作为前缀，因此label_fix将始终返回1，从而实现完美的回忆和精确度。如果您修复方法来处理前导空格，您将获得更合理的精度并重新调用

def label_fix(label):
    if label.strip().strip('.') == '<=50K':
        return 0
    else:
       return 1

在拟合模型之后，您可以使用它来预测adult.test数据文件的收入，如下所示：

test_data = pd.read_csv('adult.test.csv')
test_data['income'] = test_data['income'].apply(label_fix)
y_test_data = test_data['income']
pred_fn = tf.estimator.inputs.pandas_input_fn(x=test_data,batch_size=len(test_data),shuffle=False)
predictions = list(model.predict(input_fn=pred_fn))

final_preds_test_data=[]
for pred in predictions:
  final_preds_test_data.append(pred['class_ids'][0])

print(classification_report(y_test_data,final_preds_test_data))

请注意，我必须将strip('.')添加到label_fix方法，因为测试文件的收入列格式略有不同：

我的DNN太准确???如何修改？

1 个答案: