如何解决运行分段时出现 Tensorflow 错误
我正在使用 Jetson Xavier NX 来运行由 [Segmentation][1] 创建的分段。 这些是我正在使用的库的版本 张量流 - 1.15.4 keras - 2.1.5 蟒蛇 - 3.6.9
但是,当我运行我的程序时,出现以下错误
2021-06-14 20:30:53.671609: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES Failed at random_op.cc:76 : Resource exhausted: OOM when allocating tensor with shape[3,3,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
这是我的代码
#!/usr/bin/env python3
# coding: utf-8
import mrcnn
#print(mrcnn)
import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
# Root directory of the project
ROOT_DIR = os.path.abspath("../")
print(ROOT_DIR)
# Import Mask RCNN
sys.path.append(ROOT_DIR) # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
sys.path.append(os.path.join(ROOT_DIR,"/Mask_RCNN/samples/coco/")) # To find local version
import coco
#get_ipython().run_line_magic('matplotlib','inline')
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR,"logs")
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR,"mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
utils.download_trained_weights(COCO_MODEL_PATH)
# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR,"images")
class InferenceConfig(coco.CocoConfig):
# Set batch size to 1 since we'll be running inference on
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
config.display()
# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference",model_dir=MODEL_DIR,config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH,by_name=True)
# COCO Class names
# Index of the class in the list is its ID. For example,to get ID of
# the teddy bear class,use: class_names.index('teddy bear')
class_names = ['BG','person','bicycle','car','motorcycle','airplane','bus','train','truck','boat','traffic light','fire hydrant','stop sign','parking meter','bench','bird','cat','dog','horse','sheep','cow','elephant','bear','zebra','giraffe','backpack','umbrella','handbag','tie','suitcase','frisbee','skis','sNowboard','sports ball','kite','baseball bat','baseball glove','skateboard','surfboard','tennis racket','bottle','wine glass','cup','fork','knife','spoon','bowl','banana','apple','sandwich','orange','broccoli','carrot','hot dog','pizza','donut','cake','chair','couch','potted plant','bed','dining table','toilet','tv','laptop','mouse','remote','keyboard','cell phone','microwave','oven','toaster','sink','refrigerator','book','clock','vase','scissors','teddy bear','hair drier','toothbrush']
import cv2
# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR))[2]
image = skimage.io.imread("/sample_images/sample3.jpg")
# Run detection
results = model.detect([image],verbose=1)
# Visualize results
r = results[0]
visualize.display_instances(image,r['rois'],r['masks'],r['class_ids'],class_names,r['scores'])
cv2.imwrite("hi.jpg",image)
我在 aws ec2 上运行了相同的程序。唯一的区别是那里的 tensorflow 版本(我使用了 1.8.0 gpu)并且运行良好。是不是tensorflow版本导致的错误?
编辑 我已将此添加到代码的开头,如某些 github 问题所示
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
我仍然收到警告并且没有退出
Stats:
Limit: 2107527168
InUse: 436697856
MaxInUse: 683784192
NumAllocs: 1722
MaxAllocSize: 170917888
2021-06-15 12:58:33.338214: W tensorflow/core/common_runtime/bfc_allocator.cc:427] ***********************xx**_**************__________________________________________________________
2021-06-15 12:58:33.680104: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e.,allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently,you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
我如何检查我的 gpu 是否正确分配? [1]:https://github.com/matterport/Mask_RCNN
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。