如何在我们自己的数据集上使用Tensorflow

时间:2018-06-08 19:04:29

标签: python tensorflow neural-network deep-learning jupyter-notebook

我是Tensor流程的初学者。我试图通过使用张量流占位符来提供自己的数据集。通过大量在线研究,我发现使用tf.data.DataSet是提供自己数据的理想方式。但我只是好奇如何解决以下问题:我使用的玻璃识别数据集包含10个独立变量和1个相关变量[https://archive.ics.uci.edu/ml/machine-learning-databases/glass/glass.data][1]

这里是我的数据的一瞥:

    Id  RI  Na  Mg  Al  Si  K   Ca  Ba  Fe  Type
count   213.000000  213.000000  213.000000  213.000000  213.000000  213.000000  213.000000  213.000000  213.000000  213.000000  213.000000
mean    108.000000  1.518353    13.406761   2.676056    1.446526    72.655023   0.499108    8.957934    0.175869    0.057277    2.788732
std 61.631972   0.003039    0.818371    1.440453    0.499882    0.774052    0.653035    1.426435    0.498245    0.097589    2.105130
min 2.000000    1.511150    10.730000   0.000000    0.290000    69.810000   0.000000    5.430000    0.000000    0.000000    1.000000
25% 55.000000   1.516520    12.900000   2.090000    1.190000    72.280000   0.130000    8.240000    0.000000    0.000000    1.000000
50% 108.000000  1.517680    13.300000   3.480000    1.360000    72.790000   0.560000    8.600000    0.000000    0.000000    2.000000
75% 161.000000  1.519150    13.830000   3.600000    1.630000    73.090000   0.610000    9.180000    0.000000    0.100000    3.000000
max 214.000000  1.533930    17.380000   3.980000    3.500000    75.410000   6.210000    16.190000   3.150000    0.510000    7.000000

这是我的代码:

r,c = data.shape

n_inputs = c - 1

n_hidden1 = int(input('No. of nodes in hidden layer 1 \n'))

n_hidden2 = int(input('No. of nodes in hidden layer 2 \n'))

y_new = np.array(data['Type']-1)
y_cat = to_categorical(y_new)


n_outputs = y_cat.shape[1]



#1.1) Using Place holders to define shape of the variables

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X") # We don't know no. of rows yet but we know the no. of columns given by n_inputs

y= tf.placeholder(tf.int32, shape=(None), name="y")


# 1.2) Using name scopes and creating each layer
# A dense layer with sequential connections is created

with tf.name_scope("dnn"):

    hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation = tf.nn.relu, reuse= tf.AUTO_REUSE) #First hidden layers takes X as input

    hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation = tf.nn.relu, reuse= tf.AUTO_REUSE) # Second hidden layer takes first hidden layer as input

    logits_out = tf.layers.dense(hidden2,n_outputs,name="outputs",reuse= tf.AUTO_REUSE) # Output layer 



# 1.3) Defining a Loss function with a proper name scope

with tf.name_scope("loss"):
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits_out)
    #softmax_cross_entropy_with_logits(labels=y, logits=logits_out)

    loss = tf.reduce_mean(xentropy, name="loss")



# 1.4) Choose a good optimizer

learning_rate = float(input('Please choose a learning rate \n'))

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)

    training_op = optimizer.minimize(loss)


# 1.5) Specifying how to evaluate the model

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits_out, y,1)

    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))



#Create a global initializer

init = tf.global_variables_initializer()

ss = StandardScaler()

X = data.drop([ 'Type'], axis=1)

ss.fit(X)

X_scaled = ss.transform(X)


n_cols = X_scaled.shape[1]


strt_pos = 0; end_pos = 0

def next_batch(batch_size, iteration):
    '''  Returns batch of feature and target variables   based on index values   '''

    if iteration==0:
        strt_pos = 0; 
        end_pos = batch_size

        X_batch = X_scaled[strt_pos:end_pos]
        y_batch = y_cat[strt_pos:end_pos]    
    else:
        strt_pos = iteration * batch_size
        end_pos = strt_pos + batch_size

        X_batch = X_scaled[strt_pos:end_pos]
        y_batch = y_cat[strt_pos:end_pos]



    return np.array(X_batch), np.array(y_batch)




n_epochs = int(input('Choose no. of epochs \n')) # Choose no. of epochs for training

batch_size = int(input('Choose max batch size \n')) # Choose batch size for training


training_size = float(input('Choose the training size \n'))

total_training_size = int(r * training_size)



with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(total_training_size // batch_size):
            X_batch, y_batch = next_batch(batch_size, iteration)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_val = accuracy.eval(feed_dict={X: X_scaled, y: y_cat})
        print(epoch, "Train accuracy: ", acc_train, "Val accuracy: ", acc_val)

"数据"变量是我的数据框,在选择所有必需值时有213行和11列,我得到以下错误:

Choose no. of epochs 
20
Choose max batch size 
20
Choose the training size 
0.7
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-929e743f5882> in <module>()
     17         for iteration in range(total_training_size // batch_size):
     18             X_batch, y_batch = next_batch(batch_size, iteration)
---> 19             sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
     20         acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
     21         acc_val = accuracy.eval(feed_dict={X: X_scaled, y: data['Type']})

C:\Users\Dell\Anaconda3\lib\site-packages\pandas\core\generic.py in __hash__(self)
   1487     def __hash__(self):
   1488         raise TypeError('{0!r} objects are mutable, thus they cannot be'
-> 1489                         ' hashed'.format(self.__class__.__name__))
   1490 
   1491     def __iter__(self):

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

正如您所看到的,我已经实现了我自己的next_batch版本,我不确定它是否正好返回了张量流所期望的内容。我对数据类型一无所知&#34; feed_data&#34;期待。非常感谢任何帮助。

P.S:感谢您阅读这么长的帖子,但我别无选择,只能提供所有这些信息。

0 个答案:

没有答案