使用 DDPG RL 算法时出现 FailedPreconditionError，在 python 中，使用 keras，keras-rl2

如何解决使用 DDPG RL 算法时出现 FailedPreconditionError，在 python 中，使用 keras，keras-rl2

我正在使用 openai 健身房编写的自定义环境中训练 DDPG 代理。我在训练模型时出错。

当我在网上搜索解决方案时，我发现一些遇到类似问题的人能够通过初始化变量来解决它。

For example by using:
tf.global_variable_initialzer()

但我使用的是没有这种方法的 tensorflow 2.5.0 版。这意味着应该有其他方法来解决这个错误。但我找不到解决方案。

以下是我在这些版本中使用的库

tensorflow: 2.5.0
gym:        0.18.3
numpy:      1.19.5
keras:      2.4.3
keras-rl2:  1.0.5          DDPG agent comes from this library

错误/堆栈跟踪：

Training for 1000 steps ...
Interval 1 (0 steps performed)
   17/10000 [..............................] - ETA: 1:04 - reward: 256251545.0121
C:\Users\vchou\anaconda3\envs\AdSpendProblem\lib\site-packages\keras\engine\training.py:2401: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0,as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
  100/10000 [..............................] - ETA: 1:03 - reward: 272267266.5754
C:\Users\vchou\anaconda3\envs\AdSpendProblem\lib\site-packages\tensorflow\python\keras\engine\training.py:2426: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0,as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
---------------------------------------------------------------------------
FailedPreconditionError                   Traceback (most recent call last)
<ipython-input-17-0938aa6056e8> in <module>
      1 # Training
----> 2 ddpgAgent.fit(env,1000,verbose=1,nb_max_episode_steps = 100)

~\anaconda3\envs\AdSpendProblem\lib\site-packages\rl\core.py in fit(self,env,nb_steps,action_repetition,callbacks,verbose,visualize,nb_max_start_steps,start_step_policy,log_interval,nb_max_episode_steps)
    191                     # Force a terminal state.
    192                     done = True
--> 193                 metrics = self.backward(reward,terminal=done)
    194                 episode_reward += reward
    195 

~\anaconda3\envs\AdSpendProblem\lib\site-packages\rl\agents\ddpg.py in backward(self,reward,terminal)
    279                     state0_batch_with_action = [state0_batch]
    280                 state0_batch_with_action.insert(self.critic_action_input_idx,action_batch)
--> 281                 metrics = self.critic.train_on_batch(state0_batch_with_action,targets)
    282                 if self.processor is not None:
    283                     metrics += self.processor.metrics

~\anaconda3\envs\AdSpendProblem\lib\site-packages\keras\engine\training_v1.py in train_on_batch(self,x,y,sample_weight,class_weight,reset_metrics)
   1075       self._update_sample_weight_modes(sample_weights=sample_weights)
   1076       self._make_train_function()
-> 1077       outputs = self.train_function(ins)  # pylint: disable=not-callable
   1078 
   1079     if reset_metrics:

~\anaconda3\envs\AdSpendProblem\lib\site-packages\keras\backend.py in __call__(self,inputs)
   4017       self._make_callable(Feed_arrays,Feed_symbols,symbol_vals,session)
   4018 
-> 4019     fetched = self._callable_fn(*array_vals,4020                                 run_Metadata=self.run_Metadata)
   4021     self._call_fetch_callbacks(fetched[-len(self._fetches):])

~\anaconda3\envs\AdSpendProblem\lib\site-packages\tensorflow\python\client\session.py in __call__(self,*args,**kwargs)
   1478       try:
   1479         run_Metadata_ptr = tf_session.TF_NewBuffer() if run_Metadata else None
-> 1480         ret = tf_session.TF_Sessionruncallable(self._session._session,1481                                                self._handle,args,1482                                                run_Metadata_ptr)

FailedPreconditionError: Could not find variable dense_5_1/kernel. This Could mean that the variable has been deleted. In TF1,it can also mean the variable is uninitialized. Debug info: container=localhost,status=Not found: Resource localhost/dense_5_1/kernel/class tensorflow::Var does not exist.
     [[{{node ReadVariableOp_21}}]]

演员和评论家网络如下：

ACTOR NETWORK
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None,10)                0         
_________________________________________________________________
dense (Dense)                (None,32)                352       
_________________________________________________________________
activation (Activation)      (None,32)                0         
_________________________________________________________________
dense_1 (Dense)              (None,32)                1056      
_________________________________________________________________
activation_1 (Activation)    (None,32)                0         
_________________________________________________________________
dense_2 (Dense)              (None,32)                1056      
_________________________________________________________________
activation_2 (Activation)    (None,32)                0         
_________________________________________________________________
dense_3 (Dense)              (None,10)                330       
_________________________________________________________________
activation_3 (Activation)    (None,10)                0         
=================================================================
Total params: 2,794
Trainable params: 2,794
Non-trainable params: 0
_________________________________________________________________
None

CRITIC NETWORK
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
observation_input (InputLayer)  [(None,1,10)]      0                                            
__________________________________________________________________________________________________
action_input (InputLayer)       [(None,10)]         0                                            
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None,10)           0           observation_input[0][0]          
__________________________________________________________________________________________________
concatenate (Concatenate)       (None,20)           0           action_input[0][0]               
                                                                 flatten_1[0][0]                  
__________________________________________________________________________________________________
dense_4 (Dense)                 (None,32)           672         concatenate[0][0]                
__________________________________________________________________________________________________
activation_4 (Activation)       (None,32)           0           dense_4[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None,32)           1056        activation_4[0][0]               
__________________________________________________________________________________________________
activation_5 (Activation)       (None,32)           0           dense_5[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None,32)           1056        activation_5[0][0]               
__________________________________________________________________________________________________
activation_6 (Activation)       (None,32)           0           dense_6[0][0]                    
__________________________________________________________________________________________________
dense_7 (Dense)                 (None,1)            33          activation_6[0][0]               
__________________________________________________________________________________________________
activation_7 (Activation)       (None,1)            0           dense_7[0][0]                    
==================================================================================================
Total params: 2,817
Trainable params: 2,817
Non-trainable params: 0
__________________________________________________________________________________________________
None

这里是DDPG代理的代码

# Create DDPG agent
ddpgAgent = DDPGAgent(
    nb_actions = nb_actions,actor = actor,critic = critic,critic_action_input = action_input,memory = memory,nb_steps_warmup_critic = 100,nb_steps_warmup_actor = 100,random_process = random_process,gamma = 0.99,target_model_update = 1e-3
)

ddpgAgent.compile(Adam(learning_rate=0.001,clipnorm=1.0),metrics=['mae'])

解决方法

现在我能够通过用来自 tensorflow.keras 的导入替换来自 keras 的导入来解决这个错误，尽管我不知道为什么 keras 本身不起作用