如何解决Tensorflow TPU 错误:Shuffle Buffer 已满?
我一直在尝试使用 tensorflow 的 TPU 来训练计算机视觉模型,但是当我在 kaggle 的环境中提交 notebook 时一直出现错误。
这真的很奇怪,因为当我手动运行笔记本时,它可以正常工作并在
被以下消息困住了:
2021-01-08 00:28:59.042056:我 tensorflow/core/kernels/data/shuffle_dataset_op.cc:221] Shuffle Buffer Filled。
我的尝试
- 降低 buffer_size
- 改变加载函数的顺序
- 批量调整
如果有人知道发生了什么,请告诉我!
数据管道
AUTOTUNE = tf.data.experimental.AUTOTUNE
GCS_PATH = kaggleDatasets().get_gcs_path('cassava-leaf-disease-tfrecords-center-512x512')
BATCH_SIZE = 16 * strategy.num_replicas_in_sync
IMAGE_SIZE = [512,512]
TARGET_SIZE = 512
CLASSES = ['0','1','2','3','4']
EPOCHS = 20
def decode_image(image_data):
image = tf.image.decode_jpeg(image_data,channels=3) #decoding jpeg-encoded img to uint8 tensor
image = tf.cast(image,tf.float32) / 255.0 #cast int val to float so we can normalize it
image = tf.image.resize(image,[*IMAGE_SIZE]) #added this back seeing if it does anything
image = tf.reshape(image,[*IMAGE_SIZE,3]) #resizing to proper shape
return image
def read_tfrecord(example,labeled=True):
if labeled:
TFREC_FORMAT = {
'image': tf.io.FixedLenFeature([],tf.string),'target': tf.io.FixedLenFeature([],tf.int64),}
else:
TFREC_FORMAT = {
'image': tf.io.FixedLenFeature([],'image_name': tf.io.FixedLenFeature([],}
example = tf.io.parse_single_example(example,TFREC_FORMAT)
image = decode_image(example['image'])
if labeled:
label_or_name = tf.cast(example['target'],tf.int32)
else:
label_or_name = example['image_name']
return image,label_or_name
def load_dataset(filenames,labeled=True,ordered=False):
ignore_order = tf.data.Options()
if not ordered:
ignore_order.experimental_deterministic = False
dataset = tf.data.TFRecordDataset(filenames,num_parallel_reads=AUTOTUNE)
dataset = dataset.with_options(ignore_order)
dataset = dataset.map(lambda x: read_tfrecord(x,labeled=labeled),num_parallel_calls=AUTOTUNE)
return dataset
def get_training_dataset():
dataset = load_dataset(TRAINING_FILENAMES,labeled=True)
dataset = dataset.map(transform,num_parallel_calls=AUTOTUNE)
dataset = dataset.repeat() # the training dataset must repeat for several epochs
dataset = dataset.shuffle(2048) #set higher than input?
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(AUTOTUNE) # prefetch next batch while training (autotune prefetch buffer size)
return dataset
拟合模型:
history = model.fit(x=get_training_dataset(),epochs=EPOCHS,steps_per_epoch = STEPS_PER_EPOCH,validation_steps=VALID_STEPS,validation_data=get_validation_dataset(),callbacks = [lr_callback,model_save,my_early_stopper],verbose=1,)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。