ticactoe AI 玩的很烂！ - 极小极大算法cs50 AI中的可能错误

如何解决ticactoe AI 玩的很烂！ - 极小极大算法cs50 AI中的可能错误

我目前正在做一个 cs50 人工智能入门课程，我需要完成几个功能才能运行井字游戏。然而，在玩它时，AI 玩得很糟糕，通常在左上角选择方块，我很确定这与我的 minimax 功能有关。通过一些调试，它表明变量 foo 和 bar（尝试获得 min-value(result(s,a)) 的最高值以最大化玩家和最小化对手）不会改变并保持其原始值-无穷大和无穷大。但是我不明白为什么会发生这种情况。下面是代码，任何帮助都会很棒！

def minimax(board):
    """
    Returns the optimal action for the current player on the board.
    """
    #Checking if game is over
    if terminal(board):
        return None
    else:
        #Check whose turn it is
        turn = player(board)
        board_actions = actions(board)
        if turn == 'X':
            action_score_max = -math.inf
            return_value_min = board_actions[0]
            #return_value_max 
            for a in board_actions:
                foo = min_value(result(board,a))
                if foo > action_score_max:
                    action_score_max = foo
                    return_value_max = a
            
            return return_value_max

        else:
            action_score_min = math.inf
            return_value_min = board_actions[0]
            for a in board_actions:
                bar = max_value(result(board,a))
                if bar < action_score_min:
                    action_score_min = bar
                    return_value_min = a
            
            return return_value_min




def max_value(board):

    """
    Helper function for minimax (pick max value value of all routes)
    """

    v = -math.inf

    for action in actions(board):
        v = max(v,min_value(result(board,action)))
    
    return v



def min_value(board):

    """
    Helper function for minimax (pick min value value of all routes)
    """

    v = math.inf

    for action in actions(board):
        v = min(v,max_value(result(board,action)))

    return v

解决方法

正如对 minimax 函数的描述所暗示的那样，它的工作是返回当前玩家的最佳移动，为此您有 2 个辅助函数 max_value 和 min_value，它们是那些你应该实现你的逻辑以便它获得并返回最佳移动的地方。

你可以这样做-

def minimax(board):
    """
    Returns the optimal action for the current player on the board.
    """
    if terminal(board):
        return None
        
    if player(board) == O:
        move = min_value(board)[1]
    else:
        move = max_value(board)[1]
    return move

def max_value(board):
    if terminal(board):
        return [utility(board),None]
    v = float('-inf')
    best_move = None
    for action in actions(board):
        hypothetical_value = min_value(result(board,action))[0]
        if hypothetical_value > v:
            v = hypothetical_value
            best_move = action
    return [v,best_move]


def min_value(board):
    if terminal(board):
        return [utility(board),None]
    v = float('inf')
    best_move = None
    for action in actions(board):
        hypothetical_value = max_value(result(board,action))[0]
        if hypothetical_value < v:
            v = hypothetical_value
            best_move = action
    return [v,best_move]