如何在自定义健身环境中定义动作空间，该环境每转会收到3个缩放器和一个矩阵？

如何解决如何在自定义健身环境中定义动作空间，该环境每转会收到3个缩放器和一个矩阵？

对于个人项目，我需要定义一个运行特定棋盘游戏的自定义健身环境。游戏的每一回合，环境都会将棋盘的状态作为一元和零矩阵，以及一个动作-描述为元组：

（整数，整数，小矩阵）

通过在线阅读，我知道体育馆的环境应该是这样的：

 class CustomEnv(gym.Env):
  """Custom Environment that follows gym interface"""
  Metadata = {'render.modes': ['human']}

  def __init__(self,arg1,arg2,...):
    super(CustomEnv,self).__init__()

    self.action_space = 
    self.observation_space = 

  def step(self,action):
    ...
  def reset(self):
    ...
  def render(self,mode='human',close=False):

现在，我感觉这里的动作输入并没有完全落入“离散”或“连续”状态-我应该如何实现init函数和step函数的动作部分？

解决方法

使用健身房的元组空间在init函数中定义动作空间非常简单：

from gym import spaces
space = spaces.Tuple((
  spaces.Discrete(5),spaces.Discrete(4),spaces.Box(low=0,high=1,shape=(2,2))))

Discrete空间代表整数范围，而Box空间代表n维数组。您可以打印空间样本以了解其外观：

print(space.sample())
>>> (3,1,array([[0.20318432,0.26787955],[0.5323673,0.6564413 ]],dtype=float32))

对于step函数，您只需要根据输入动作与您的环境进行交互，输入动作的格式将与示例一样。