我在tensorflow-gpu上遇到问题。分配器(GPU_0_bfc)内存不足,试图分配带有freed_by_count = 0

时间:2020-09-24 16:42:49

标签: tensorflow keras gpu

系统信息

我是否编写了自定义代码(与使用TensorFlow中提供的股票示例脚本相对):

    操作系统:Windows 10 Home with 8GB ram,NVIDIA MX250 2GB显卡。- 从(源或二进制)安装TensorFlow:源“ pip install tensorflow-gpu”
  1. TensorFlow版本:2.2.0
  2. Python版本:3.6 CUDA / cuDNN版本:
  3. CUDA 10.2和CUDNN:8.0.3.33,即适用于cuda 10.1 GPU模型和
  4. 内存:具有2GB图形的NVIDIA MX250

问题

我的tensorflow-gpu运行正常。一旦我重新安装了anaconda并在那之后再次安装tensorflow-gpu 每当我尝试在tensorflow-gpu上训练任何模型时,它总是给我这个错误:


依靠驱动程序执行ptx编译。 修改$ PATH以自定义ptxas位置。 该消息将仅记录一次。 2020-09-24 21:35:23.361799:W tensorflow / core / common_runtime / bfc_allocator.cc:246]分配器(GPU_0_bfc)内存不足,试图分配带有freed_by_count = 0的2.20GiB。调用方表明这不是故障,但可能意味着如果有更多的可用内存,则可能会提高性能。 2020-09-24 21:35:23.363348:W tensorflow / core / common_runtime / bfc_allocator.cc:246]分配器(GPU_0_bfc)内存不足,试图使用freed_by_count = 0分配1.02GiB。调用方表明这不是故障,但可能意味着如果有更多的可用内存,则可能会提高性能。 2020-09-24 21:35:23.627463:W tensorflow / core / common_runtime / bfc_allocator.cc:246]分配器(GPU_0_bfc)内存不足,试图分配带有freed_by_count = 0的1.11GiB。调用方表示这不是故障,但可能意味着如果有更多可用内存,则可能会提高性能。


简单代码

from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, Flatten,Conv2D,MaxPool2D,AvgPool2D,Dropout
from tensorflow.keras.models import Model,Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from glob import glob
import matplotlib.pyplot as plt

train_path="E:/AI-Application-Implementation/trained_model/Classification/Cifar-10/data/train"
test_path="E:/AI-Application-Implementation/trained_model/Classification/Cifar-10/data/test"

folders=glob("E:/All Data Set/CIFAR10/train/*")

datagen=ImageDataGenerator(rotation_range=0.5,
brightness_range=[0.2,0.5],
zoom_range=[0.1,0.8],
horizontal_flip=True,
validation_split=0.2,
rescale=1./255)

train=datagen.flow_from_directory(directory=train_path,
target_size=(256,256),
# color_mode="grayscale",
shuffle=True,
class_mode='categorical',
subset='training')

test=datagen.flow_from_directory(directory=train_path,
target_size=(256,256),
# color_mode="grayscale",
shuffle=True,
class_mode='categorical',
subset='validation')

model=Sequential()

model.add(Conv2D(filters=32,kernel_size=(3,3),input_shape=(256,256,3),activation='relu'))
model.add(Conv2D(filters=32,kernel_size=(3,3),activation='relu'))
model.add(MaxPool2D(pool_size=(3,3)))
model.add(Conv2D(filters=64,kernel_size=(3,3),activation='relu'))
model.add(Conv2D(filters=64,kernel_size=(3,3),activation='relu'))
model.add(MaxPool2D(pool_size=(3,3)))
model.add(Conv2D(filters=128,kernel_size=(3,3),activation='relu'))
model.add(Conv2D(filters=128,kernel_size=(3,3),activation='relu'))
model.add(Conv2D(filters=128,kernel_size=(3,3),activation='relu'))
model.add(MaxPool2D(pool_size=(3,3)))
model.add(AvgPool2D(pool_size=(6,6)))
model.add(Flatten())
model.add(Dense(units=64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=10,activation='softmax'))
model.summary()
model.compile(loss='sparse_categorical_crossentropy',optimizer="adam",metrics=['accuracy'])

history=model.fit(train,validation_data=test,epochs=5,steps_per_epoch=len(train),validation_steps=len(test))

其他信息/日志包括任何有助于以下方面的日志或源代码: **


**

2020-09-24 21:35:09.493907:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cudart64_101.dll,发现40000 图像属于10类。

2020-09-24 21:35:09.493907:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cudart64_101.dll 2020-09-24 21:35:14.913210:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库nvcuda.dll

2020-09-24 21:35:09.493907:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cudart64_101.dll 2020-09-24 21:35:14.913210:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库nvcuda.dll 2020-09-24 21:35:15.668676:我 tensorflow / core / common_runtime / gpu / gpu_device.cc:1716]找到设备0 具有属性:pciBusID:0000:06:00.0名称:GeForce MX250 computeCapability:6.1 coreClock:1.582GHz coreCount:3 deviceMemory大小:2.00GiB deviceMemory带宽:44.76GiB / s 2020-09-24 21:35:15.673588:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cudart64_101.dll 2020-09-24 21:35:15.729057:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cublas64_10.dll 2020-09-24 21:35:15.750802:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cufft64_10.dll 2020-09-24 21:35:15.756718:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库curand64_10.dll 2020-09-24 21:35:15.777083:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cusolver64_10.dll 2020-09-24 21:35:15.781830:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cusparse64_10.dll 2020-09-24 21:35:15.795484:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cudnn64_7.dll 2020-09-24 21:35:15.795889:我 tensorflow / core / common_runtime / gpu / gpu_device.cc:1858]添加可见 gpu设备:0 2020-09-24 21:35:15.797938:我 tensorflow / core / platform / cpu_feature_guard.cc:142]此TensorFlow 二进制文件已使用oneAPI深度神经网络库(oneDNN)优化以 在关键性能操作中使用以下CPU指令: AVX2要在其他操作中启用它们,请使用 适当的编译器标志。 2020-09-24 21:35:15.809116:我 tensorflow / compiler / xla / service / service.cc:168] XLA服务 为平台主机初始化了0x29073047360(这不保证 将会使用XLA)。设备:2020-09-24 21:35:15.810096:I tensorflow / compiler / xla / service / service.cc:176] StreamExecutor设备 (0):主机,默认版本2020-09-24 21:35:15.811012:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1716]找到设备0 具有属性:pciBusID:0000:06:00.0名称:GeForce MX250 computeCapability:6.1 coreClock:1.582GHz coreCount:3 deviceMemory大小:2.00GiB deviceMemory带宽:44.76GiB / s 2020-09-24 21:35:15.812247:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cudart64_101.dll 2020-09-24 21:35:15.812911:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cublas64_10.dll 2020-09-24 21:35:15.813512:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cufft64_10.dll 2020-09-24 21:35:15.814104:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库curand64_10.dll 2020-09-24 21:35:15.814716:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cusolver64_10.dll 2020-09-24 21:35:15.815307:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cusparse64_10.dll 2020-09-24 21:35:15.815903:我 tensorflow / stream_executor / platform / default / dso_loader.cc:48] 成功打开动态库cudnn64_7.dll 2020-09-24 21:35:15.816520:我 tensorflow / core / common_runtime / gpu / gpu_device.cc:1858]添加可见 gpu设备:0找到属于10类的10000张图像。模型: “顺序的”

0 个答案:

没有答案