我正在Colab上编写一个简单的TensorFlow代码。我正在尝试从头开始编写Word2Vec(使用方法here)。 neighbors
只是一堆类似("cat", "nice")
的元组,y
是其标签0
或1
。
问题是运行此块后,Colab几乎立即崩溃,我不知道如何知道为什么。 (并且我的运行时位于GPU上,我知道this question)。
其日志显示“ 遇到非法的内存访问”。我不明白在这里什么是非法的!
embeding_tensor = tf.Variable(tf.random.uniform(shape=[len(words_lst), embeding_size]))
context_tensor = tf.Variable(tf.random.uniform(shape=[len(words_lst), embeding_size]))
for idx, neighbor in enumerate(neighbors):
x, y = data_generator(neighbor)
y = tf.convert_to_tensor(y, dtype='float32')
with tf.GradientTape() as t:
middle = tf.gather(embeding_tensor, word2index[x[0][0]])
neighbor_choices = tf.gather(context_tensor, [word2index[i[1]] for i in x])
scores = tf.tensordot(neighbor_choices, middle, 1)
prediction = tf.nn.sigmoid(scores)
loss = y - prediction
g_embed, g_context = t.gradient(loss, [embeding_tensor, context_tensor])
tf.compat.v1.scatter_sub(embeding_tensor, [word2index[x[0][0]]], g_embed)
这是日志:
Oct 25, 2020, 9:44:09 PM WARNING WARNING:root:kernel 8c6ec4a8-cabd-4d91-a214-1c3a91f36571 restarted
Oct 25, 2020, 9:44:09 PM INFO KernelRestarter: restarting kernel (1/5), keep random ports
Oct 25, 2020, 9:44:06 PM WARNING 2020-10-25 18:14:06.799062: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:220] Unexpected Event status: 1
Oct 25, 2020, 9:44:06 PM WARNING 2020-10-25 18:14:06.799003: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Oct 25, 2020, 9:44:06 PM WARNING 2020-10-25 18:14:06.383027: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
Oct 25, 2020, 9:44:05 PM WARNING 2020-10-25 18:14:05.542284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13962 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Oct 25, 2020, 9:44:05 PM WARNING 2020-10-25 18:14:05.542217: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
Oct 25, 2020, 9:44:05 PM WARNING 2020-10-25 18:14:05.541345: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Oct 25, 2020, 9:44:05 PM WARNING 2020-10-25 18:14:05.540333: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Oct 25, 2020, 9:44:05 PM WARNING 2020-10-25 18:14:05.539944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N