我是Tensor流程的初学者。我试图通过使用张量流占位符来提供自己的数据集。通过大量在线研究,我发现使用tf.data.DataSet是提供自己数据的理想方式。但我只是好奇如何解决以下问题:我使用的玻璃识别数据集包含10个独立变量和1个相关变量[https://archive.ics.uci.edu/ml/machine-learning-databases/glass/glass.data][1]
这里是我的数据的一瞥:
Id RI Na Mg Al Si K Ca Ba Fe Type
count 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000
mean 108.000000 1.518353 13.406761 2.676056 1.446526 72.655023 0.499108 8.957934 0.175869 0.057277 2.788732
std 61.631972 0.003039 0.818371 1.440453 0.499882 0.774052 0.653035 1.426435 0.498245 0.097589 2.105130
min 2.000000 1.511150 10.730000 0.000000 0.290000 69.810000 0.000000 5.430000 0.000000 0.000000 1.000000
25% 55.000000 1.516520 12.900000 2.090000 1.190000 72.280000 0.130000 8.240000 0.000000 0.000000 1.000000
50% 108.000000 1.517680 13.300000 3.480000 1.360000 72.790000 0.560000 8.600000 0.000000 0.000000 2.000000
75% 161.000000 1.519150 13.830000 3.600000 1.630000 73.090000 0.610000 9.180000 0.000000 0.100000 3.000000
max 214.000000 1.533930 17.380000 3.980000 3.500000 75.410000 6.210000 16.190000 3.150000 0.510000 7.000000
这是我的代码:
r,c = data.shape
n_inputs = c - 1
n_hidden1 = int(input('No. of nodes in hidden layer 1 \n'))
n_hidden2 = int(input('No. of nodes in hidden layer 2 \n'))
y_new = np.array(data['Type']-1)
y_cat = to_categorical(y_new)
n_outputs = y_cat.shape[1]
#1.1) Using Place holders to define shape of the variables
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X") # We don't know no. of rows yet but we know the no. of columns given by n_inputs
y= tf.placeholder(tf.int32, shape=(None), name="y")
# 1.2) Using name scopes and creating each layer
# A dense layer with sequential connections is created
with tf.name_scope("dnn"):
hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation = tf.nn.relu, reuse= tf.AUTO_REUSE) #First hidden layers takes X as input
hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation = tf.nn.relu, reuse= tf.AUTO_REUSE) # Second hidden layer takes first hidden layer as input
logits_out = tf.layers.dense(hidden2,n_outputs,name="outputs",reuse= tf.AUTO_REUSE) # Output layer
# 1.3) Defining a Loss function with a proper name scope
with tf.name_scope("loss"):
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits_out)
#softmax_cross_entropy_with_logits(labels=y, logits=logits_out)
loss = tf.reduce_mean(xentropy, name="loss")
# 1.4) Choose a good optimizer
learning_rate = float(input('Please choose a learning rate \n'))
with tf.name_scope("train"):
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
# 1.5) Specifying how to evaluate the model
with tf.name_scope("eval"):
correct = tf.nn.in_top_k(logits_out, y,1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
#Create a global initializer
init = tf.global_variables_initializer()
ss = StandardScaler()
X = data.drop([ 'Type'], axis=1)
ss.fit(X)
X_scaled = ss.transform(X)
n_cols = X_scaled.shape[1]
strt_pos = 0; end_pos = 0
def next_batch(batch_size, iteration):
''' Returns batch of feature and target variables based on index values '''
if iteration==0:
strt_pos = 0;
end_pos = batch_size
X_batch = X_scaled[strt_pos:end_pos]
y_batch = y_cat[strt_pos:end_pos]
else:
strt_pos = iteration * batch_size
end_pos = strt_pos + batch_size
X_batch = X_scaled[strt_pos:end_pos]
y_batch = y_cat[strt_pos:end_pos]
return np.array(X_batch), np.array(y_batch)
n_epochs = int(input('Choose no. of epochs \n')) # Choose no. of epochs for training
batch_size = int(input('Choose max batch size \n')) # Choose batch size for training
training_size = float(input('Choose the training size \n'))
total_training_size = int(r * training_size)
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
for iteration in range(total_training_size // batch_size):
X_batch, y_batch = next_batch(batch_size, iteration)
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
acc_val = accuracy.eval(feed_dict={X: X_scaled, y: y_cat})
print(epoch, "Train accuracy: ", acc_train, "Val accuracy: ", acc_val)
"数据"变量是我的数据框,在选择所有必需值时有213行和11列,我得到以下错误:
Choose no. of epochs
20
Choose max batch size
20
Choose the training size
0.7
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-19-929e743f5882> in <module>()
17 for iteration in range(total_training_size // batch_size):
18 X_batch, y_batch = next_batch(batch_size, iteration)
---> 19 sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
20 acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
21 acc_val = accuracy.eval(feed_dict={X: X_scaled, y: data['Type']})
C:\Users\Dell\Anaconda3\lib\site-packages\pandas\core\generic.py in __hash__(self)
1487 def __hash__(self):
1488 raise TypeError('{0!r} objects are mutable, thus they cannot be'
-> 1489 ' hashed'.format(self.__class__.__name__))
1490
1491 def __iter__(self):
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
正如您所看到的,我已经实现了我自己的next_batch版本,我不确定它是否正好返回了张量流所期望的内容。我对数据类型一无所知&#34; feed_data&#34;期待。非常感谢任何帮助。
P.S:感谢您阅读这么长的帖子,但我别无选择,只能提供所有这些信息。