我正在训练我的网络,因为它似乎有足够的内存来测试网络,但没有足够的内存用于培训:
I0411 16:41:04.669823 6823 solver.cpp:331] Iteration 0, Testing net (#0)
I0411 16:43:31.625444 6823 solver.cpp:398] Test net output #0: intermediate_loss = nan (* 1 = nan loss)
I0411 16:43:31.625897 6823 solver.cpp:398] Test net output #1: loss = nan (* 1 = nan loss)
F0411 16:43:33.259964 6823 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x2ae6f6039b2d google::LogMessage::Fail()
@ 0x2ae6f603b995 google::LogMessage::SendToLog()
@ 0x2ae6f60396a9 google::LogMessage::Flush()
@ 0x2ae6f603c42e google::LogMessageFatal::~LogMessageFatal()
@ 0x2ae6f4e7b0c8 caffe::SyncedMemory::mutable_gpu_data()
@ 0x2ae6f4ce8143 caffe::Blob<>::mutable_gpu_diff()
@ 0x2ae6f4ec3bc6 caffe::EltwiseLayer<>::Backward_gpu()
@ 0x2ae6f4e38dab caffe::Net<>::BackwardFromTo()
@ 0x2ae6f4e38def caffe::Net<>::Backward()
@ 0x2ae6f4e5d763 caffe::Solver<>::Step()
@ 0x2ae6f4e5e2da caffe::Solver<>::Solve()
@ 0x2ae6f4e516c4 caffe::NCCL<>::Run()
@ 0x40ebda train()
@ 0x40b983 main
@ 0x2ae70b5fab35 __libc_start_main
@ 0x40c42d (unknown)
在我的solver.prototxt
我test_net: ...
和train_net: ...
。我觉得caffe在测试和训练时会占用更多内存。我怎样才能防止这种情况发生?