具有多种特征的时间序列数据，从 TFRecords 到 keras lstm 模型

如何解决具有多种特征的时间序列数据，从 TFRecords 到 keras lstm 模型

我试图根据具有大约 50 个特征的 TFRecords 文件预测大约 80 个类中的一个。一些功能是 int64，一些是浮动的。一个 TFRecords 文件有大约 1000 个数据点，每个数据点有 50 个特征和一个标签。所以一个 TFRecords 文件是（50 个特征 + 1 个标签）* 1000 个数据点。对于给定的 TFRecords 文件，所有数据点的标签都相同。我不确定这是否是最好的解决方案，但我认为它会给我最大的灵活性。

对于上下文：我收集了跑步数据，我想根据数据预测跑步的人。我想训练一个 lstm 模型。因为一个TFRecord文件是一个人运行的数据，所以每个数据点的标签总是相同的。

不幸的是，官方的 tensorflow 文档主要是关于输入管道的图像。

这是我读取数据的方式：

import tensorflow as tf
import os,os.path

feature_description = {
 
    'feature4': tf.io.FixedLenFeature((),tf.float32),'feature5': tf.io.FixedLenFeature((),'feature6': tf.io.FixedLenFeature((),'feature7': tf.io.FixedLenFeature((),tf.int64),'feature8': tf.io.FixedLenFeature((),'feature9': tf.io.FixedLenFeature((),#more features here
    'label': tf.io.FixedLenFeature((),}

     def _parse_function(example_proto):
          # Parse the input `tf.train.Example` proto using the dictionary above.
          return tf.io.parse_example(example_proto,feature_description)


    tfrecord_dataset = tf.data.TFRecordDataset(filenames = [filenames])

每个科目我都有很多数据，所以我必须批量读取数据。我认为这样的事情可能会起作用：

    parsed_dataset = tfrecord_dataset.map(_parse_function,num_parallel_calls=8)
    final_dataset = parsed_dataset.shuffle(buffer_size=len(filenames)).batch(10)

此时我被卡住了。我不确定对我来说最好的方法是将所有功能与标签分开，以及如何最好地将其与 keras 模型连接起来。我的计划是不打乱一个 TFRecords 文件中的数据，而是打乱输入文件本身。

This here helped me a bit up to a certain point

有什么想法吗？