为什么我的教师强制推理功能不起作用？

如何解决为什么我的教师强制推理功能不起作用？

我正在研究翻译模型。为此，我尝试使用注意力机制实现编码器-解码器模型。我想实施教师强制培训。

我做到了，这是我的模型：

encoder_input = Input(shape=(X_enc.shape[1]))
encoder_emb = Embedding(vocab_size_en,300,weights=[embedding_matrix],trainable=False,mask_zero=True)
encoder = encoder_emb(encoder_input)
encoder_lstm = LSTM(1024,return_state=True,return_sequences=True)
encoding_lstm,state_h,state_c = encoder_lstm(encoder)

encoder_states = [state_h,state_c]

decoder_input = Input(shape=(X_dec.shape[1]),name='decoder_input')
decoder_emb = Embedding(vocab_size_fr,mask_zero=True)
decoder = decoder_emb(decoder_input)
decoder_lstm = LSTM(1024,return_sequences=True,name='decoder_lstm')
decoding_lstm,_,_ = decoder_lstm(decoder,initial_state=[state_h,state_c])


attention = dot([decoding_lstm,encoding_lstm],axes=[2,2])
attention_layer = Activation('softmax')
attention = attention_layer(attention)

context = dot([attention,1])

decoder_combined_context = concatenate([context,decoding_lstm])
decoder_dense1 = Timedistributed(Dense(512,activation="tanh"))
dense1 = decoder_dense1(decoder_combined_context)
decoder_dense2 = Timedistributed(Dense(vocab_size_fr,activation="softmax"))
output = decoder_dense2(dense1)


model = Model(inputs=[encoder_input,decoder_input],outputs=[output])

但是，为了进行预测，我必须实现一个推理函数，因此，将我的编码器与我的解码器分开。这是我的尝试：

encoder = encoder_emb(encoder_input)
encoding_lstm,state_c = encoder_lstm(encoder)


encoding_results = Input(shape=(1024))
decoder_state_input_h = Input(shape=(1024))
decoder_state_input_c = Input(shape=(1024))

decoder_states_inputs = [decoder_state_input_h,decoder_state_input_c]

decoder_embedded = decoder_emb(decoder_input)
decoder_outputs,state_c = decoder_lstm(decoder_embedded,initial_state=decoder_states_inputs)
decoder_states = [state_h,state_c]

attention = dot([decoder_outputs,encoding_results],2])
attention = Activation('softmax')(attention)

context = dot([attention,decoder_outputs])

decoder_outputs = decoder_dense1(decoder_combined_context)
decoder_outputs = decoder_dense2(decoder_outputs)

encoder_model = Model(encoder_input,[encoding_lstm,state_c])
decoder_model = Model([decoder_inputs,encoding_results] + decoder_states_inputs,[decoder_outputs] + decoder_states)

这似乎有问题，因为它返回给我：

Dimensions must be equal,but are 4096 and 1024 for '{{node mul/mul}} = Mul[T=DT_FLOAT](Sigmoid_1,init_c)' with input shapes: [?,75,4096],[?,1024].

谁能给我一些想法来帮助我理解这个问题？