使用tf.data.cachefile时发生并发缓存迭代器错误

如何解决使用tf.data.cachefile时发生并发缓存迭代器错误

使用tf.data.cache（file）后接model.fit()时出现以下错误，我不确定为什么会这样。目录中没有lockfile。

tensorflow.python.framework.errors_impl.AlreadyExistsError:  There appears to be a concurrent caching iterator running - cache lockfile already exists ('/tmp/cache/mydataset-train_0.lockfile'). If you are sure no other running TF computations are using this cache prefix,delete the lockfile and re-initialize the iterator. Lockfile contents: Created at: 1601972246
     [[node IteratorGetNext (defined at /Users/lzuwei/workspace/train_model.py:132) ]] [Op:__inference_train_function_2847]

Function call stack:
train_function

这是我的数据管道的样子，files_list具有{f1record}格式的15文件。 num_parallel_reads设置为15

ds = tf.data.TFRecordDataset(filenames=files_list,compression_type='GZIP',num_parallel_reads=num_parallel_reads) \
        .map(map_fn,num_parallel_calls=tf.data.experimental.AUTOTUNE) \
        .cache("/tmp/cache/mydataset-train") \
        .shuffle(buffer_size=10*batch_size) \
        .batch(batch_size) \
        .prefetch(tf.data.experimental.AUTOTUNE)

model_merged = modelMHA_tfa() # returns a tf.keras.models.Model

model_merged.fit(
    ds,epochs=10,)

def map_fn(data_record):
    features = tf.io.parse_single_example(data_record,fc_dataset_schema)
    # dd = tf.cast(features['a'],dtype=tf.float32)
    X = tf.stack([
        tf.cast(features['b'],dtype=tf.float32),tf.cast(features['c'],features['d'],features['e'],features['f'],features['g']
    ],axis=0
    )
    Y = tf.stack([
        features['h']
    ],axis=0
    )
    return X,Y

任何提示和建议都将不胜感激！

解决方法

问题是由于在数据集的model.fit()之前创建了迭代器。

ds_iter = iter(ds)
x,y = ds_iter.next()

删除此代码后，问题已解决。