
为什么在获取 MobileNet 的热图时将梯度设为“无”

如何解决为什么在获取 MobileNet 的热图时将梯度设为“无”

我在 MobileNet 模型中添加一个注意力层,如下所示。

mobile = tf.keras.applications.mobilenet.MobileNet(weights='imagenet')

x = mobile.layers[-6].input

if True:
    x = Reshape([7*7,1024])(x)
    att = MultiHeadsAttModel(l=7*7,d=1024,dv=64,dout=1024,nv = 16 )
    x = att([x,x,x])
    x = Reshape([7,7,1024])(x)   
    x = Batchnormalization()(x)

x = mobile.get_layer('global_average_pooling2d')(x)
x = mobile.get_layer('reshape_1')(x)
x = mobile.get_layer('dropout')(x)
x = mobile.get_layer('conv_preds')(x)
x = mobile.get_layer('reshape_2')(x)
output = Dense(units=50,activation='softmax')(x)

model = Model(inputs=mobile.input,outputs=output)

for layer in model.layers[:-23]:
    layer.trainable = False


with tf.GradientTape(persistent=True) as gtape:
    last_conv_layer = model.get_layer('conv_preds')
    iterate = tf.keras.models.Model([model.inputs],[model.output,last_conv_layer.output])
    model_out,last_conv_layer = iterate(img_tensor)
    class_out = model_out[:,np.argmax(model_out[0])]
    grads = gtape.gradient(class_out,last_conv_layer)


 WARNING:tensorflow:Calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape,leading to increased cpu and memory usage). Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.

