如何解决unity ml代理的奇怪结果python API
我正在使用3DBall示例环境,但是我得到了一些非常奇怪的结果,我不知道它们为什么会发生。到目前为止,我的代码只是一个for range循环,用于查看奖励并用随机值填充所需的输入。但是,当我这样做时,从来没有显示出负面的奖励,并且随机地没有决策步骤,这是有道理的,但是难道它不应该继续模拟直到有决策步骤吗?任何帮助将不胜感激,因为除文档外,几乎没有任何帮助。
env = UnityEnvironment()
env.reset()
behavior_names = env.behavior_specs
for i in range(50):
arr = []
behavior_names = env.behavior_specs
for i in behavior_names:
print(i)
DecisionSteps = env.get_steps("3DBall?team=0")
print(DecisionSteps[0].reward,len(DecisionSteps[0].reward))
print(DecisionSteps[0].action_mask) #for some reason it returns action mask as false when Decisionsteps[0].reward is empty and is None when not
for i in range(len(DecisionSteps[0])):
arr.append([])
for b in range(2):
arr[-1].append(random.uniform(-10,10))
if(len(DecisionSteps[0])!= 0):
env.set_actions("3DBall?team=0",numpy.array(arr))
env.step()
else:
env.step()
env.close()
解决方法
我认为您的问题是,当模拟终止并且需要重置时,代理不会返回decision_step
,而是返回terminal_step
。这是因为代理放下了球,并且在terminal_step中返回的奖励为-1.0。我已经接受了您的代码并进行了一些更改,现在它可以正常运行(除了您可能要进行更改,以免每次代理之一丢球时都不会重置)。
import numpy as np
import mlagents
from mlagents_envs.environment import UnityEnvironment
# -----------------
# This code is used to close an env that might not have been closed before
try:
unity_env.close()
except:
pass
# -----------------
env = UnityEnvironment(file_name = None)
env.reset()
for i in range(1000):
arr = []
behavior_names = env.behavior_specs
# Go through all existing behaviors
for behavior_name in behavior_names:
decision_steps,terminal_steps = env.get_steps(behavior_name)
for agent_id_terminated in terminal_steps:
print("Agent " + behavior_name + " has terminated,resetting environment.")
# This is probably not the desired behaviour,as the other agents are still active.
env.reset()
actions = []
for agent_id_decisions in decision_steps:
actions.append(np.random.uniform(-1,1,2))
# print(decision_steps[0].reward)
# print(decision_steps[0].action_mask)
if len(actions) > 0:
env.set_actions(behavior_name,np.array(actions))
try:
env.step()
except:
print("Something happend when taking a step in the environment.")
print("The communicatior has probably terminated,stopping simulation early.")
break
env.close()
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。