我想使用TensorFlow拆分csv的训练和测试数据,但我在张量中找不到像np.loadtxt这样的命令,并尝试使用numpy进行拆分并将其转换为张量,但是我收到如下错误:
TypeError: object of type 'Tensor' has no len()
这是我的代码:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
x = tf.convert_to_tensor( np.loadtxt('data.csv', delimiter=','))
y = tf.convert_to_tensor(np.loadtxt('labels.csv', delimiter=','))
x_train, x_test, y_train, y_test = train_test_split(x, y,
test_size=0.25, random_state='')
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape= (426,30,1)),
tf.keras.layers.Dense(126, activation=tf.nn.tanh),
#tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='sgd',
loss='mean_squared_error',
metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=5 ) #validation_data
= [x_test, y_test])
model.evaluate(x_test, y_test)
t_predicted = model.predict(x_test)
out_predicted = np.argmax(t_predicted, axis=1)
conf_matrix = tf.confusion_matrix(y_test, out_predicted)
with tf.Session():
print('Confusion Matrix: \n\n', tf.Tensor.eval(conf_matrix,
feed_dict=None, session=None))
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
答案 0 :(得分:0)
首先加载csv文件,进行拆分,然后将拆分结果交给Tf,会不会更简单?
sklearn.model_selection.train_test_split()
并不是要与从tf.convert_to_tensor()
中获得的Tensor对象一起使用。
颠倒顺序使您的代码在小型测试脚本中工作
x = np.loadtxt('data.csv', delimiter=',')
y = np.loadtxt('labels.csv', delimiter=',')
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)
x_train = tf.convert_to_tensor(x_train)
x_test = tf.convert_to_tensor(x_test)
y_train = tf.convert_to_tensor(y_train)
y_test = tf.convert_to_tensor(y_test)
答案 1 :(得分:0)
最佳实践是不要将全部数据加载到张量中。如果代码是在GPU上执行的,并且数据量很大,则张量可能会占用大量GPU内存,从而导致“内存不足”错误。常用的方法是
train_test_split
的方式)在处理大/大图像时,我们无法将所有图像加载到内存中。这里通常遵循的方法是将一批图像加载到张量中,并用于训练/验证。所有的深度学习框架都提供了在多线程中加载多个批处理的机制,因此下一个训练步骤无需等待下一个批处理被加载。
如果您仍然希望将完整数据加载到张量中并将其拆分为训练张量和测试张量,则可以使用张量流方法tf.split
。 https://www.tensorflow.org/api_docs/python/tf/split