运行以下TensorFlow代码后,Google Colab:“您的会话因未知原因而崩溃”。 “遇到非法的内存访问”

时间:2020-10-25 18:28:22

标签: tensorflow google-colaboratory tensorflow2.0

我正在Colab上编写一个简单的TensorFlow代码。我正在尝试从头开始编写Word2Vec(使用方法here)。 neighbors只是一堆类似("cat", "nice")的元组,y是其标签01

问题是运行此块后,Colab几乎立即崩溃,我不知道如何知道为什么。 (并且我的运行时位于GPU上,我知道this question)。

其日志显示“ 遇到非法的内存访问”。我不明白在这里什么是非法的!

embeding_tensor = tf.Variable(tf.random.uniform(shape=[len(words_lst), embeding_size]))
context_tensor = tf.Variable(tf.random.uniform(shape=[len(words_lst), embeding_size]))

for idx, neighbor in enumerate(neighbors):
  x, y = data_generator(neighbor)
  y = tf.convert_to_tensor(y, dtype='float32')

  with tf.GradientTape() as t:    
    middle = tf.gather(embeding_tensor, word2index[x[0][0]])
    neighbor_choices = tf.gather(context_tensor, [word2index[i[1]] for i in x])

    scores = tf.tensordot(neighbor_choices, middle, 1)
    prediction = tf.nn.sigmoid(scores)
    loss = y - prediction

    g_embed, g_context = t.gradient(loss, [embeding_tensor, context_tensor])

    tf.compat.v1.scatter_sub(embeding_tensor, [word2index[x[0][0]]], g_embed)

这是日志:

Oct 25, 2020, 9:44:09 PM    WARNING WARNING:root:kernel 8c6ec4a8-cabd-4d91-a214-1c3a91f36571 restarted
Oct 25, 2020, 9:44:09 PM    INFO    KernelRestarter: restarting kernel (1/5), keep random ports
Oct 25, 2020, 9:44:06 PM    WARNING 2020-10-25 18:14:06.799062: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:220] Unexpected Event status: 1
Oct 25, 2020, 9:44:06 PM    WARNING 2020-10-25 18:14:06.799003: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Oct 25, 2020, 9:44:06 PM    WARNING 2020-10-25 18:14:06.383027: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
Oct 25, 2020, 9:44:05 PM    WARNING 2020-10-25 18:14:05.542284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13962 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Oct 25, 2020, 9:44:05 PM    WARNING 2020-10-25 18:14:05.542217: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
Oct 25, 2020, 9:44:05 PM    WARNING 2020-10-25 18:14:05.541345: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Oct 25, 2020, 9:44:05 PM    WARNING 2020-10-25 18:14:05.540333: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Oct 25, 2020, 9:44:05 PM    WARNING 2020-10-25 18:14:05.539944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N

0 个答案:

没有答案