我尝试计算数据量为(624003, 17424)
的张量流任务,该任务是使用CountVectorizer
从文本中获取的。
我总是遇到错误tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized
但是,如果我使用(213556, 11605)
这样的数据样本,则效果很好。
但是在增加数据集大小后,它会失败。
我尝试将此代码用于tensorflow
batch_size = 1024
X = tf.placeholder(tf.float32, shape=(None, X_train.shape[1]), name="X")
y = tf.placeholder(tf.float32, shape=(None, y_train.shape[1]), name="y")
# set model weights
weights = tf.Variable(tf.random_normal([X_train.shape[1], y_train.shape[1]], stddev=0.5), name="weights")
# construct model
y_pred = tf.nn.sigmoid(tf.matmul(X, weights))
# minimize error using cross entropy
# cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(y_pred), reduction_indices=1))
cost = tf.reduce_mean(-(y*tf.log(y_pred) + (1 - y)*tf.log(1 - y_pred)))
optimizer_01 = tf.train.AdamOptimizer(learning_rate=0.01).minimize(cost)
optimizer_001 = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
# saving model weights
saver = tf.train.Saver({"weights": weights})
# variables initializing
init = tf.global_variables_initializer()
# starting session
with tf.Session(config=tf.ConfigProto(device_count={'GPU': 0})) as sess:
sess.run(init)
在主程序段中,我训练火车数据并获得acc的测试数据。
如何学习所有火车数据并避免超出内存?
对于批处理,我使用下一个功能
def optimize(session, optimizer, X_train, X_test, y_train, y_test, epoch=1):
for epoch in range(epoch):
for batch_i, (start, end) in enumerate(split(0, X_train.shape[0], batch_size)):
x_batch, y_true_batch, = X_train[start:end].toarray(), y_train[start:end]
feed_dict_train = {X: x_batch, y: y_true_batch}
session.run(optimizer, feed_dict=feed_dict_train)
feed_dict_test = {X: X_test.toarray(), y: y_test}
cost_step_test = session.run(cost, feed_dict={X: X_test.toarray(), y: y_test})
答案 0 :(得分:0)
(624003,17424)张量约为40GBytes。因此,您不应该分配这么大的张量。 您需要放弃整批培训,而改用小批量培训。