Tensorflow官方教程中给出的Bahdanau's Attention的两种不同代码有什么区别？

如何解决Tensorflow官方教程中给出的Bahdanau's Attention的两种不同代码有什么区别？

我正在阅读机器翻译任务并编写代码，并在两个不同的教程中遇到了困难。

其中一个是 Caption Generation using Visual Attention 论文实现，其中他们使用了 [64,2048] 的图像特征，使得每个图像都是 64 个单词的句子，句子中的每个单词的嵌入为 2048长度。我完全明白了这个实现，这是下面的 Bahdanau's Additive style Attention 代码：

class BahdanauAttention(tf.keras.Model):
  def __init__(self,units):
    super(BahdanauAttention,self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self,features,hidden):
    hidden_with_time_axis = tf.expand_dims(hidden,1)
    attention_hidden_layer = (tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis)))
  
    score = self.V(attention_hidden_layer)

    attention_weights = tf.nn.softmax(score,axis=1)

    context_vector = attention_weights * features
    context_vector = tf.reduce_sum(context_vector,axis=1)
    
    return context_vector,attention_weights

但是当我去Neural Machine Language Translation Task时，我发现那里很复杂，我无法理解这里发生的事情：

class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self,units):
    super().__init__()
    self.W1 = tf.keras.layers.Dense(units,use_bias=False)
    self.W2 = tf.keras.layers.Dense(units,use_bias=False)
    
    self.attention = tf.keras.layers.AdditiveAttention()

  def call(self,query,value,mask):
    w1_query = self.W1(query)
    w2_key = self.W2(value)

    query_mask = tf.ones(tf.shape(query)[:-1],dtype=bool)
    value_mask = mask

    context_vector,attention_weights = self.attention(inputs = [w1_query,w2_key],mask=[query_mask,value_mask],return_attention_scores = True,)
    return context_vector,attention_weights

我想问

两者有什么区别？
为什么我们不能在第二个中使用字幕生成代码，反之亦然？