从预训练加载 Bert 模型时 Python 崩溃

时间:2021-04-16 22:40:27

标签: python tensorflow bert-language-model huggingface-transformers

我来这里是因为我遇到了一个最模糊的问题。当我开始创建我的模型时,我遇到了我的 gpu 使用高峰,然后我的 python 代码崩溃。这仅在我尝试仅使用“from_pretrained”中的任何模型时才会发生,Tensorflow 和 PyTourch 本身都没有问题(这种行为仅适​​用于转换器)

例如: 运行这行代码时出现问题,就在我脚本的开头;

model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')

我收到以下消息,这些消息非常标准,但正如您在底部看到的那样,代码只是停止了。

<
2021-04-16 16:16:35.330093: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-16 16:16:38.495667: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-04-16 16:16:38.519178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s
2021-04-16 16:16:38.519500: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-16 16:16:38.528695: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-16 16:16:38.528923: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-16 16:16:38.533582: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-16 16:16:38.535368: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-16 16:16:38.540093: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-16 16:16:38.543728: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-16 16:16:38.544662: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-16 16:16:38.544888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-04-16 16:16:38.545436: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-16 16:16:38.546588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 10 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.99GiB/s
2021-04-16 16:16:38.547283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-04-16 16:16:39.115250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-16 16:16:39.115490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0
2021-04-16 16:16:39.115592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1319] 0: N
2021-04-16 16:16:39.115856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1446] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4634 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-04-16 16:16:39.419407: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-16 16:16:39.709427: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll

Process finished with exit code -1073741819 (0xC0000005)

有没有其他人看过这个?有什么我在这里想念的吗? 感谢您的帮助。

这是我的系统的详细信息。

变形金刚版本:最新 平台:Windows Python版本:3.7 PyTorch 版本(GPU?):最新 Tensorflow 版本(GPU?):最新 在脚本中使用 GPU?:是的,GeForce GTX 1060 计算能力:6.1 在脚本中使用分布式或并行设置?:否 我遇到此错误的模型:

与此问题相关的模型: 阿尔伯特,伯特,xlm:

--更新: 我进一步将问题缩小到所有 TF[ModelName] 预训练模型。

0 个答案:

没有答案