我的DNN太准确???如何修改?

时间:2018-01-24 10:09:34

标签: python tensorflow deep-learning dnn-module

伙计们我是深度学习的新手。我正在US-Adult Income dataset

上培训DNN

我哪里出错了?另一个问题我想在不同的数据集上测试我的模型我该如何实际做到这一点?

这是我的代码:

import pandas as pd
input_data = pd.read_csv('adult.data.csv')


def label_fix(label):
    if label == '<=50K':
        return 0
    else:
       return 1

input_data['Income'] = input_data['Income'].apply(label_fix)

from sklearn.model_selection import train_test_split

x_data = input_data.drop('Income',axis = 1)
y_labels = input_data['Income']
X_train,X_test,y_train,y_test = train_test_split(x_data,y_labels,test_size= 0.3,random_state=101)


import tensorflow as tf



Age = tf.feature_column.numeric_column('Age')
Job_class = tf.feature_column.categorical_column_with_hash_bucket('Job-
Class',hash_bucket_size=1000)
fnlwgt = tf.feature_column.numeric_column('fnlwgt')
Education = tf.feature_column.categorical_column_with_hash_bucket('Education',hash_bucket_size=1000)
Education_num = tf.feature_column.numeric_column('Education-num')
Status = tf.feature_column.categorical_column_with_hash_bucket('Status',hash_bucket_size=1000)
Designation = tf.feature_column.categorical_column_with_hash_bucket('Designation',hash_bucket_size=1000)
Marital = tf.feature_column.categorical_column_with_hash_bucket('Marital',hash_bucket_size=1000)
Colour = tf.feature_column.categorical_column_with_vocabulary_list('Colour',['White', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo', 'Other', 'Black'])
Gender = tf.feature_column.categorical_column_with_vocabulary_list('Gender',['Male','Female'])
Capital_gain = tf.feature_column.numeric_column('capital-gain')
Capital_loss = tf.feature_column.numeric_column('capital-loss')
Hours = tf.feature_column.numeric_column('hours-per-week')
Native_country = tf.feature_column.categorical_column_with_hash_bucket('Native-Country',hash_bucket_size=1000)
Income = tf.feature_column.numeric_column('Income')



feats_cols = [Age,Job_class,fnlwgt,Education,Education_num,Status,Designation,Marital,Colour,Gender,Capital_gain,Capital_loss,Hours,Native_country]

model = tf.estimator.LinearClassifier(feature_columns=feats_cols)

input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,batch_size=100,num_epochs=None,shuffle=True)

model.train(input_fn=input_func,steps = 5000)
  

INFO:tensorflow:创建CheckpointSaverHook。   INFO:tensorflow:将1的检查点保存到C:\ Users \ micha \ AppData \ Local \ Temp \ tmpj2usekuf \ model.ckpt。   INFO:tensorflow:loss = 69.31474,step = 1   信息:tensorflow:global_step / sec:149.21   INFO:tensorflow:loss = 0.0,step = 101(0.676 sec)   信息:tensorflow:global_step / sec:189.379   INFO:tensorflow:loss = 0.0,step = 201(0.528 sec)   信息:tensorflow:global_step / sec:179.441   信息:张量流:损失= 0.0,步长= 301(0.551秒)   信息:tensorflow:global_step / sec:170.941   INFO:tensorflow:loss = 0.0,step = 401(0.585 sec)   信息:tensorflow:global_step / sec:176.699   INFO:tensorflow:loss = 0.0,step = 501(0.574 sec)   信息:tensorflow:global_step / sec:196.918   信息:tensorflow:loss = 0.0,step = 601(0.505 sec)   信息:tensorflow:global_step / sec:186.552   INFO:tensorflow:loss = 0.0,step = 701(0.536 sec)   信息:tensorflow:global_step / sec:195.329   信息:tensorflow:loss = 0.0,step = 801(0.515 sec)   信息:tensorflow:global_step / sec:174.856   信息:tensorflow:loss = 0.0,step = 901(0.569 sec)   信息:tensorflow:global_step / sec:176.354   信息:tensorflow:loss = 0.0,step = 1001(0.562 sec)   信息:tensorflow:global_step / sec:168.888   INFO:tensorflow:loss = 0.0,step = 1101(0.592 sec)   信息:tensorflow:global_step / sec:171.54   INFO:tensorflow:loss = 0.0,step = 1201(0.600 sec)   信息:tensorflow:global_step / sec:171.716   INFO:tensorflow:loss = 0.0,step = 1301(0.573 sec)   信息:tensorflow:global_step / sec:178.132   INFO:tensorflow:loss = 0.0,step = 1401(0.558 sec)   信息:tensorflow:global_step / sec:180.651   信息:张量流:损失= 0.0,步长= 1501(0.549秒)   信息:tensorflow:global_step / sec:175.073   INFO:tensorflow:loss = 0.0,step = 1601(0.580 sec)   信息:tensorflow:global_step / sec:177.171   INFO:tensorflow:loss = 0.0,step = 1701(0.556 sec)   信息:tensorflow:global_step / sec:173.214   INFO:tensorflow:loss = 0.0,step = 1801(0.594 sec)   信息:tensorflow:global_step / sec:165.829   INFO:tensorflow:loss = 0.0,step = 1901(0.586 sec)   信息:tensorflow:global_step / sec:175.255   INFO:tensorflow:loss = 0.0,step = 2001(0.571 sec)   信息:tensorflow:global_step / sec:171.048   INFO:tensorflow:loss = 0.0,step = 2101(0.593 sec)   信息:tensorflow:global_step / sec:181.424   信息:tensorflow:loss = 0.0,step = 2201(0.548 sec)   信息:tensorflow:global_step / sec:175.714   信息:tensorflow:loss = 0.0,step = 2301(0.569 sec)   信息:tensorflow:global_step / sec:166.801   INFO:tensorflow:loss = 0.0,step = 2401(0.594 sec)   信息:tensorflow:global_step / sec:173.364   信息:tensorflow:损失= 0.0,步长= 2501(0.580秒)   信息:tensorflow:global_step / sec:169.802   信息:tensorflow:loss = 0.0,step = 2601(0.587 sec)   信息:tensorflow:global_step / sec:175.314   INFO:tensorflow:loss = 0.0,step = 2701(0.569 sec)   信息:tensorflow:global_step / sec:172.503   INFO:tensorflow:loss = 0.0,step = 2801(0.585 sec)   信息:tensorflow:global_step / sec:184.231   INFO:tensorflow:loss = 0.0,step = 2901(0.545 sec)   信息:tensorflow:global_step / sec:184.926   INFO:tensorflow:loss = 0.0,step = 3001(0.537 sec)   信息:tensorflow:global_step / sec:189.303   INFO:tensorflow:loss = 0.0,step = 3101(0.526 sec)   信息:tensorflow:global_step / sec:188.679   信息:tensorflow:loss = 0.0,step = 3201(0.536 sec)   信息:tensorflow:global_step / sec:184.756   INFO:tensorflow:loss = 0.0,step = 3301(0.552 sec)   信息:tensorflow:global_step / sec:184.09   INFO:tensorflow:loss = 0.0,step = 3401(0.534 sec)   信息:tensorflow:global_step / sec:176.366   INFO:tensorflow:loss = 0.0,step = 3501(0.559 sec)   信息:tensorflow:global_step / sec:178.401   INFO:tensorflow:loss = 0.0,step = 3601(0.567 sec)   信息:tensorflow:global_step / sec:192.295   INFO:tensorflow:loss = 0.0,step = 3701(0.523 sec)   信息:tensorflow:global_step / sec:190.446   INFO:tensorflow:loss = 0.0,step = 3801(0.526 sec)   信息:tensorflow:global_step / sec:181.776   INFO:tensorflow:loss = 0.0,step = 3901(0.546 sec)   信息:tensorflow:global_step / sec:174.088   INFO:tensorflow:loss = 0.0,step = 4001(0.577 sec)   信息:tensorflow:global_step / sec:182.692   信息:tensorflow:loss = 0.0,step = 4101(0.546 sec)   信息:tensorflow:global_step / sec:189.383   信息:tensorflow:loss = 0.0,step = 4201(0.526 sec)   信息:tensorflow:global_step / sec:183.433   信息:tensorflow:loss = 0.0,step = 4301(0.556 sec)   信息:tensorflow:global_step / sec:169.08   信息:tensorflow:loss = 0.0,step = 4401(0.576 sec)   信息:tensorflow:global_step / sec:170.028   INFO:tensorflow:loss = 0.0,step = 4501(0.594 sec)   信息:tensorflow:global_step / sec:173.793   信息:tensorflow:loss = 0.0,step = 4601(0.574 sec)   信息:tensorflow:global_step / sec:177.173   信息:tensorflow:loss = 0.0,step = 4701(0.561 sec)   信息:tensorflow:global_step / sec:172.853   信息:tensorflow:loss = 0.0,step = 4801(0.583 sec)   信息:tensorflow:global_step / sec:179.073   INFO:tensorflow:loss = 0.0,step = 4901(0.561 sec)   信息:tensorflow:将5000的检查点保存到C:\ Users \ micha \ AppData \ Local \ Temp \ tmpj2usekuf \ model.ckpt。   信息:张量流:最后一步的损失:0.0。   出[127]:   

pred_fn = tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=len(X_test),shuffle=False)

predictions = list(model.predict(input_fn=pred_fn))

final_preds=[]
for pred in predictions:
  final_preds.append(pred['class_ids'][0])

from sklearn.metrics import classification_report

print(classification_report(y_test,final_preds))
         precision    recall  f1-score   support

     1       1.00      1.00      1.00      9769
     

平均/总计1.00 1.00 1.00 9769

1 个答案:

答案 0 :(得分:0)

您的方法label_fix中存在错误。由于<=50K值始终以空格(<=50K)作为前缀,因此label_fix将始终返回1,从而实现完美的回忆和精确度。如果您修复方法来处理前导空格,您将获得更合理的精度并重新调用

def label_fix(label):
    if label.strip().strip('.') == '<=50K':
        return 0
    else:
       return 1

在拟合模型之后,您可以使用它来预测adult.test数据文件的收入,如下所示:

test_data = pd.read_csv('adult.test.csv')
test_data['income'] = test_data['income'].apply(label_fix)
y_test_data = test_data['income']
pred_fn = tf.estimator.inputs.pandas_input_fn(x=test_data,batch_size=len(test_data),shuffle=False)
predictions = list(model.predict(input_fn=pred_fn))

final_preds_test_data=[]
for pred in predictions:
  final_preds_test_data.append(pred['class_ids'][0])

print(classification_report(y_test_data,final_preds_test_data))

请注意,我必须将strip('.')添加到label_fix方法,因为测试文件的收入列格式略有不同: