如何解决在 keras 中使用 None 维度的自定义层中添加可训练的权重
我正在尝试实现论文中提到的自定义注意力层 - 3-D Convolutional Recurrent Neural Networks with Attention Model for Speech Emotion Recognition 在 keras 中。该代码可在 Github repo 上获得。
`import tensorflow as tf
def attention(inputs,attention_size,time_major=False,return_alphas=False):
"""
Attention mechanism layer which reduces RNN/Bi-RNN outputs with Attention vector.
The idea was proposed in the article by Z. Yang et al.,"Hierarchical Attention Networks
for Document Classification",2016: http://www.aclweb.org/anthology/N16-1174.
Variables notation is also inherited from the article
Args:
inputs: The Attention inputs.
Matches outputs of RNN/Bi-RNN layer (not final state):
In case of RNN,this must be RNN outputs `Tensor`:
If time_major == False (default),this must be a tensor of shape:
`[batch_size,max_time,cell.output_size]`.
If time_major == True,this must be a tensor of shape:
`[max_time,batch_size,cell.output_size]`.
In case of Bidirectional RNN,this must be a tuple (outputs_fw,outputs_bw) containing the forward and
the backward RNN outputs `Tensor`.
If time_major == False (default),outputs_fw is a `Tensor` shaped:
`[batch_size,cell_fw.output_size]`
and outputs_bw is a `Tensor` shaped:
`[batch_size,cell_bw.output_size]`.
If time_major == True,outputs_fw is a `Tensor` shaped:
`[max_time,cell_fw.output_size]`
and outputs_bw is a `Tensor` shaped:
`[max_time,cell_bw.output_size]`.
attention_size: Linear size of the Attention weights.
time_major: The shape format of the `inputs` Tensors.
If true,these `Tensors` must be shaped `[max_time,depth]`.
If false,these `Tensors` must be shaped `[batch_size,depth]`.
Using `time_major = True` is a bit more efficient because it avoids
transposes at the beginning and end of the RNN calculation. However,most TensorFlow data is batch-major,so by default this function
accepts input and emits output in batch-major form.
return_alphas: Whether to return attention coefficients variable along with layer's output.
Used for visualization purpose.
Returns:
The Attention output `Tensor`.
In case of RNN,this will be a `Tensor` shaped:
`[batch_size,cell.output_size]`.
In case of Bidirectional RNN,cell_fw.output_size + cell_bw.output_size]`.
"""
if isinstance(inputs,tuple):
# In case of Bi-RNN,concatenate the forward and the backward RNN outputs.
inputs = tf.concat(inputs,2)
if time_major:
# (T,B,D) => (B,T,D)
inputs = tf.array_ops.transpose(inputs,[1,2])
hidden_size = inputs.shape[2].value # D value - hidden size of the RNN layer
# Trainable parameters
W_omega = tf.Variable(tf.random.normal([hidden_size,attention_size],stddev=0.1))
b_omega = tf.Variable(tf.random.normal([attention_size],stddev=0.1))
u_omega = tf.Variable(tf.random.normal([attention_size],stddev=0.1))
# Applying fully connected layer with non-linear activation to each of the B*T timestamps;
# the shape of `v` is (B,D)*(D,A)=(B,A),where A=attention_size
#v = tf.tanh(tf.tensordot(inputs,W_omega,axes=1) + b_omega)
v = tf.sigmoid(tf.tensordot(inputs,axes=1) + b_omega)
# For each of the timestamps its vector of size A from `v` is reduced with `u` vector
vu = tf.tensordot(v,u_omega,axes=1) # (B,T) shape
alphas = tf.nn.softmax(vu) # (B,T) shape also
# Output of (Bi-)RNN is reduced with attention vector; the result has (B,D) shape
output = tf.reduce_sum(input_tensor=inputs * tf.expand_dims(alphas,-1),axis=1)
if not return_alphas:
return output
else:
return output,alphas`
我正在尝试将其实现为自定义层
import tensorflow as tf
import keras.backend as K
from keras.engine.topology import Layer
class att_Layer1D(Layer):
def __init__(self,attention_dim,**kwargs):
self.attention_dim = attention_dim
super(att_Layer1D,self).__init__(**kwargs)
def build(self,input_shape):
print(len(input_shape))
# Create a trainable weight variable for this layer.
assert len(input_shape) >= 3
input_dim = input_shape[1:]
print(input_shape)
self.kernel1 = self.add_weight(shape=(input_dim[1],self.attention_dim),name = 'kernel1',initializer='uniform',trainable=True)
print(self.kernel1)
self.b_omega = self.add_weight(shape=(input_shape[0],# name = 'kernel2',# initializer='uniform',# trainable=True)
# print(self.b_omega)
self.u_omega = self.add_weight(shape=(input_shape[0],# name = 'kernel3',# trainable=True)
# print(self.u_omega)
# print(self.kernel1)
super(att_Layer1D,self).build(input_shape) # Be sure to call this at the end
def call(self,x):
print(x.shape)
input_shape=K.int_shape(x)
v= K.sigmoid(K.dot(x,self.kernel1)+self.b_omega)
vu = K.dot(v,self.u_omega)
alphas = K.softmax(vu)
output = tf.reduce_sum(x * K.expand_dims(alphas,1)
return output
def compute_output_shape(self,input_shape):
# return (input_shape[0],self.output_dim[0],self.output_dim[1])
return (input_shape[0],input_shape[2])
def get_config(self):
config = {
'attention_dim': self.attention_dim,}
base_config = super(att_Layer1D,self).get_config()
return dict(list(base_config.items()) + list(config.items()))
但是,我在尝试添加权重 u_omega 和 b_omega 时遇到了问题,因为它无法初始化具有未知批次维度的权重。
是否有任何解决方法。任何帮助将不胜感激。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。