Question

我想知道是否有一种方法可以将大多数权重仅保留在磁盘内存中，并且仅在将要进行训练或即将进行训练时才将它们加载到GPU内存中。

我正在训练嵌入来表示项目（例如word2vec，Glove等），我有超过4000万个项目，嵌入大小理想为256。在每个训练步骤中，仅一小部分参数被训练/更新

初始化输入嵌入矩阵和softmax嵌入矩阵时，我的Tensorflow环境崩溃。

如果有帮助，请参考以下图形：

batch_size = 16384
embedding_size = 256
num_inputs =5
vocabulary_size = 40000000

num_sampled = 64 # Number of negative examples to sample.

graph = tf.Graph()

with graph.as_default() , tf.device('/gpu:0'):

    train_dataset = tf.placeholder(tf.int32, shape=[batch_size, num_inputs ])
    train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])

    embeddings = tf.get_variable( 'embeddings', dtype=tf.float32,
        initializer= tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0, dtype=tf.float32) )

    softmax_weights = tf.get_variable( 'softmax_weights', dtype=tf.float32,
        initializer= tf.truncated_normal([vocabulary_size, embedding_size],
                             stddev=1.0 / math.sqrt(embedding_size), dtype=tf.float32 ) )

    softmax_biases = tf.get_variable('softmax_biases', dtype=tf.float32,
        initializer= tf.zeros([vocabulary_size], dtype=tf.float32),  trainable=False )

    embed = tf.nn.embedding_lookup(embeddings, train_dataset) #train data set is

    embed_reshaped = tf.reshape( embed, [batch_size*num_inputs, embedding_size] )

    segments= np.arange(batch_size).repeat(num_inputs)

    averaged_embeds = tf.segment_mean(embed_reshaped, segments, name=None)

    loss = tf.reduce_mean(
        tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=averaged_embeds, 
                                   sampled_values=tf.nn.uniform_candidate_sampler(true_classes=tf.cast(train_labels, tf.int64), num_sampled=num_sampled, num_true=1, unique=True, range_max=vocabulary_size, seed=None),
                                   labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size)) 

    optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)

我的Tensorflow环境只能处理约500万个项目。所有嵌入均已初始化，并已准备好在内存中进行训练。但是，对于每个训练步骤，唯一受训练的嵌入是：批次中每个标签的softmax嵌入，批次中所有输入的输入嵌入以及负片中所有标签的负样本的softmax嵌入批次。

因此，在每个训练步骤中，仅训练总重量中很小的一部分。

我想知道是否有一种方法可以将大多数权重仅保留在磁盘内存中，并且仅在将要进行训练或即将进行训练时才将它们加载到GPU内存中。

到目前为止，我还没有在Tensorflow中看到过类似的事情。可以在Pytorch中完成吗？

如果每步仅训练一部分重量，是否有办法将重量部分加载到Tensorflow或Pytorch的内存中？

0 个答案: