如何解决训练时张量的奇怪形状和 ResourceExhaustedError:分配张量时出现 OOM
我正在尝试使用这个 github repo 运行对象检测,它利用了简单的 7 层 Single Shot MultiBox Detector。 我使用包在 Google Colab 上运行了这个:keras==2.2.4 & tensorflow-gpu==1.13.1
最终我在训练时遇到了下面这个错误。我想抱怨的另一件事是导致它崩溃的张量的形状有一个形状 [2,1232,1640,48] 哪里...
- 2 是批量大小
- 1232 是宽度的一半(奇怪)
- 1640 是高度的一半(奇怪)
- 不确定 48 在哪里
Epoch 1/5
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-28-3fbd9e60a593> in <module>()
19
20 max_queue_size=1,---> 21 workers=0)
7 frames
/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args,**kwargs)
89 warnings.warn('Update your `' + object_name + '` call to the ' +
90 'Keras 2 API: ' + signature,stacklevel=2)
---> 91 return func(*args,**kwargs)
92 wrapper._original_function = func
93 return wrapper
/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self,generator,steps_per_epoch,epochs,verbose,callbacks,validation_data,validation_steps,class_weight,max_queue_size,workers,use_multiprocessing,shuffle,initial_epoch)
1416 use_multiprocessing=use_multiprocessing,1417 shuffle=shuffle,-> 1418 initial_epoch=initial_epoch)
1419
1420 @interfaces.legacy_generator_methods_support
/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model,initial_epoch)
215 outs = model.train_on_batch(x,y,216 sample_weight=sample_weight,--> 217 class_weight=class_weight)
218
219 outs = to_list(outs)
/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self,x,sample_weight,class_weight)
1215 ins = x + y + sample_weights
1216 self._make_train_function()
-> 1217 outputs = self.train_function(ins)
1218 return unpack_singleton(outputs)
1219
/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py in __call__(self,inputs)
2713 return self._legacy_call(inputs)
2714
-> 2715 return self._call(inputs)
2716 else:
2717 if py_any(is_tensor(x) for x in inputs):
/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py in _call(self,inputs)
2673 fetched = self._callable_fn(*array_vals,run_metadata=self.run_metadata)
2674 else:
-> 2675 fetched = self._callable_fn(*array_vals)
2676 return fetched[:len(self.outputs)]
2677
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in __call__(self,*args,**kwargs)
1437 ret = tf_session.TF_SessionRunCallable(
1438 self._session._session,self._handle,args,status,-> 1439 run_metadata_ptr)
1440 if run_metadata:
1441 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py in __exit__(self,type_arg,value_arg,traceback_arg)
526 None,None,527 compat.as_text(c_api.TF_Message(self.status.status)),--> 528 c_api.TF_GetCode(self.status.status))
529 # Delete the underlying status object from memory otherwise it stays alive
530 # as there is a reference to status from this from the traceback due to
ResourceExhaustedError: OOM when allocating tensor with shape[2,48] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training/Adam/gradients/zeros_22-0-1-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens,add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node loss/add_14}}]]
Hint: If you want to see a list of allocated tensors when OOM happens,add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
请说明发生了什么以及如何解决这个问题。如果这有助于找到错误,我还可以分享有关模型结构的更多相关细节。
解决方法
如果你有形状的数据
(2,1232,1640,3)
在它通过带有 "SAME"
填充的 42 个过滤器的卷积层后,它将具有形状
(2,42)
并且在您的 GPU 上没有该张量的位置。
我查看了 repo,有一堆带有 48 个过滤器的层
conv2 = Conv2D(48,(3,3),strides=(1,1),padding="same",kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg),name='conv2')(pool1)
conv2 = BatchNormalization(axis=3,momentum=0.99,name='bn2')(conv2)
conv2 = ELU(name='elu2')(conv2)
pool2 = MaxPooling2D(pool_size=(2,2),name='pool2')(conv2)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。