如何解决Tensorflow - autodiff 是否让我们从 back-prop 实现中重生？

问题

当使用 Tensorflow 时，例如实现自定义神经网络层，实现反向传播的标准做法是什么？我们不需要研究自动微分公式吗？

背景

使用 numpy，在创建图层时，例如matmul，反向传播梯度首先被解析导出并相应地编码。

def forward(self,X):
    self._X = X
    np.matmul(self.X,self.W.T,out=self._Y)
    return self.Y

def backward(self,dY):
    """dY = dL/dY is a jacobian where L is loss and Y is matmul output"""
    self._dY = dY
    return np.matmul(self.dY,self.W,out=self._dX)

在 Tensorflow 中，有 autodiff 似乎负责雅可比计算。这是否意味着我们不必手动推导梯度公式，而是让 Tensorflow 胶带自行处理？

计算梯度

为了自动区分，TensorFlow 需要记住在前向传递期间以什么顺序发生的操作。然后，在反向传递期间，TensorFlow 以相反的顺序遍历此操作列表以计算梯度。

解决方法

基本上，Tensorflow 是一个基于数据流和可微分编程的符号数学库。我们不必手动处理自动微分公式。所有这些数学运算都将自动完成。您从关于梯度计算的官方文档中正确引用了。但是，如果您想知道如何使用 numpy 手动完成，我建议您查看 Neural Networks and Deep Learning 的精彩课程，尤其是第 4 周，或其他来源 here .

仅供参考，在 TF 2 中，我们可以通过覆盖 train_step 类的 tf.keras.Model 从头开始进行自定义训练，然后我们可以使用 tf.GradientTape API 进行自动微分；也就是说，计算相对于某些输入的计算梯度。同一官方页面包含有关此的更多信息。此外，必须在 tf.GradientTape 上看到这篇写得很好的文章。例如，使用这个 API，我们可以很容易地计算梯度，如下所示：

import tensorflow as tf 

# some input 
x = tf.Variable(3.0,trainable=True)

with tf.GradientTape() as tape:
    # some output 
    y = x**3 + x**2 + x + 5

# compute gradient of y wrt x 
print(tape.gradient(y,x).numpy()) 
# 34

此外，我们可以计算更高阶的导数，例如

x = tf.Variable(3.0,trainable=True)

with tf.GradientTape() as tape1:

    with tf.GradientTape() as tape2:
        y = x**3 + x**2 + x + 5
    # first derivative 
    order_1 = tape2.gradient(y,x)

# second derivative 
order_2 = tape1.gradient(order_1,x)

print(order_2.numpy()) 
# 20.0

现在，在 tf. keras 中的自定义模型训练中，我们首先进行 forward 传递并计算 loss，然后计算模型的可训练变量的 gradients尊重loss。稍后，我们根据这些 gradients 更新模型的权重。下面是它的代码片段，这里是端到端的详细信息。 Writing a training loop from scratch.

# Open a GradientTape to record the operations run
# during the forward pass,which enables auto-differentiation.
with tf.GradientTape() as tape:

    # Run the forward pass of the layer.
    # The operations that the layer applies
    # to its inputs are going to be recorded
    # on the GradientTape.
    logits = model(x_batch_train,training=True)  # Logits for this minibatch

    # Compute the loss value for this minibatch.
    loss_value = loss_fn(y_batch_train,logits)

# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value,model.trainable_weights)

# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads,model.trainable_weights))

正确，您只需要定义前向传递，Tensorflow 就会生成适当的后向传递。来自tf2 autodiff：

TensorFlow 提供了 tf.GradientTape API 用于自动差异化；也就是说，计算一个计算的梯度关于某些输入，通常是 tf.Variables。 TensorFlow“记录” 在 tf.GradientTape 的上下文中执行的相关操作到“磁带”上。 TensorFlow 然后使用该磁带来计算梯度使用反向模式微分的“记录”计算。

为此，Tensorflow 被赋予前向传递（或损失）和一组 tf.Variable 变量来计算导数。此过程仅适用于 Tensorflow 本身定义的一组特定操作。为了创建自定义 NN 层，您需要使用这些操作定义其前向 pas（所有这些操作都是 TF 的一部分或由某些转换器转换为它）。*

由于您似乎有 numpy 背景，您可以使用 numpy 定义自定义前向传递，然后使用 tf_numpy API 将其转换为 Tensorflow。您也可以使用 tf.numpy_function。在此之后，TF 将为您创建反向传播。

(*) 注意一些操作，比如控制语句本身是不可微的，因此它们对于基于梯度的优化器是不可见的。有一些关于这些的警告。

Tensorflow - autodiff 是否让我们从 back-prop 实现中重生？

如何解决Tensorflow - autodiff 是否让我们从 back-prop 实现中重生？

问题

背景

解决方法

相关推荐