actor_network 输出规范与动作规范不匹配：TensorSpec(...) 与 BoundedTensorSpec(...)

如何解决actor_network 输出规范与动作规范不匹配：TensorSpec(...) 与 BoundedTensorSpec(...)

我正在尝试创建一个actor Policy，它是一个使用tf_agents 将观察（状态空间）映射到动作（动作空间）的神经网络。以下是我的实现（受到他们教程的高度启发：https://www.tensorflow.org/agents/tutorials/3_policies_tutorial）

input_tensor_spec = tensor_spec.TensorSpec((5,),tf.float32)
time_step_spec = ts.time_step_spec(input_tensor_spec)
action_spec = tensor_spec.BoundedTensorSpec((),tf.int32,minimum=0,maximum=9)


class ActionNet(network.Network):

    def __init__(self,input_tensor_spec,output_tensor_spec):
        super(ActionNet,self).__init__(
            input_tensor_spec=input_tensor_spec,state_spec=(),name='ActionNet')
        self._output_tensor_spec = output_tensor_spec
        self._sub_layers = [
            tf.keras.layers.Dense(
                100,activation=tf.nn.relu),tf.keras.layers.Dense(
                    action_spec.shape.num_elements(),activation=tf.nn.sigmoid),]

    def call(self,observations,step_type,network_state):
        del step_type

        output = tf.cast(observations,dtype=tf.float32)
        for layer in self._sub_layers:
            output = layer(output)
        actions = tf.reshape(output,[-1] + self._output_tensor_spec.shape.as_list())

        actions *= 9
        print(actions)
        actions = tf.math.round(actions)

        # Scale and shift actions to the correct range if necessary.
        return actions,network_state




action_net = ActionNet(input_tensor_spec,action_spec)

my_actor_policy = actor_policy.ActorPolicy(
    time_step_spec=time_step_spec,action_spec=action_spec,actor_network=action_net)

我收到以下错误：

ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(),dtype=tf.float32,name=None)
vs.
BoundedTensorSpec(shape=(),dtype=tf.int32,name=None,minimum=array(0),maximum=array(9))
  In call to configurable 'ActorPolicy' (<class 'tf_agents.policies.actor_policy.ActorPolicy'>)

这基本上是说我的神经网络的输出不是有界张量。如何将神经网络的输出转换为有界张量。就我而言，由于我希望输出介于 0 和 9 之间，因此我只是将 sigmoid 输出乘以 9 并四舍五入。这不起作用，因为类型仍然是无界张量。

非常感谢