如何将pd.DataFrame转换为tf.data.Dataset(或使用insted pd.DataFrame)进行DNNClassifier

时间:2018-12-28 03:50:49

标签: python pandas tensorflow neural-network classification

我收到此警告。

更新说明: 要构建输入管道,请使用tf.data模块。

我进行了一些搜索,但是我无法弄清楚tf.data.Dataset背后的逻辑,因此我无法管理将pd.DataFrame转换为tf.data.Dataset。

在代码末尾,我还需要帮助来进行预测,我无法找出正确的方法来将预测(高概率输出)与标签进行比较。

注意:数据没有列名,所以我在列中将a1命名为a784,以便可以将它们分配给feature_columns。

谢谢。

代码如下:

import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf 

from sklearn import metrics
from tensorflow.python.data import Dataset

mnist_df = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/mnist_train_small.csv",header=None)

mnist_df.describe()

mnist_df.columns

hand_df = mnist_df[0]

matrix_df = mnist_df.drop([0],axis=1)

matrix_df.head()

hand_df.head()

#creating cols array and append a1 to a784 in order to name columns
cols=[]
for i in range(785):
    if i!=0:
        a = '{}{}'.format('a',i)
        cols.append(a)

matrix_df.columns = cols

mnist_df = mnist_df.head(10000)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(matrix_df, hand_df, test_size=0.3, random_state=101)

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

matrix_df = pd.DataFrame(data=scaler.fit_transform(matrix_df),
                         columns=matrix_df.columns,
                         index=matrix_df.index)

#naming columns so I will not get error while assigning feature_columns
for i in range(len(cols)):
    a=i+1
    b='{}{}'.format('a',a)
    cols[i] = tf.feature_column.numeric_column(str(b))

matrix_df.head()

input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,
                                                 batch_size=10,num_epochs=1000,
                                                 shuffle=True)

my_optimizer = tf.train.AdagradOptimizer(learning_rate=0.03)

my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)

model = tf.estimator.DNNClassifier(feature_columns=cols,
                                   hidden_units=[32,64],
                                      n_classes=10,
                                      optimizer=my_optimizer,
                                      config=tf.estimator.RunConfig(keep_checkpoint_max=1))

model.train(input_fn=input_func,steps=1000)

predict_input_func = tf.estimator.inputs.pandas_input_fn(x=X_test,
                                                         batch_size=50,
                                                         num_epochs=1,
                                                         shuffle=False)

pred_gen = model.predict(predict_input_func)

predictions = list(pred_gen)

predictions[0]

0 个答案:

没有答案