微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

为什么我的教师强制推理功能不起作用?

如何解决为什么我的教师强制推理功能不起作用?

我正在研究翻译模型。为此,我尝试使用注意力机制实现编码器-解码器模型。我想实施教师强制培训。

我做到了,这是我的模型:

encoder_input = Input(shape=(X_enc.shape[1]))
encoder_emb = Embedding(vocab_size_en,300,weights=[embedding_matrix],trainable=False,mask_zero=True)
encoder = encoder_emb(encoder_input)
encoder_lstm = LSTM(1024,return_state=True,return_sequences=True)
encoding_lstm,state_h,state_c = encoder_lstm(encoder)

encoder_states = [state_h,state_c]

decoder_input = Input(shape=(X_dec.shape[1]),name='decoder_input')
decoder_emb = Embedding(vocab_size_fr,mask_zero=True)
decoder = decoder_emb(decoder_input)
decoder_lstm = LSTM(1024,return_sequences=True,name='decoder_lstm')
decoding_lstm,_,_ = decoder_lstm(decoder,initial_state=[state_h,state_c])


attention = dot([decoding_lstm,encoding_lstm],axes=[2,2])
attention_layer = Activation('softmax')
attention = attention_layer(attention)

context = dot([attention,1])

decoder_combined_context = concatenate([context,decoding_lstm])
decoder_dense1 = Timedistributed(Dense(512,activation="tanh"))
dense1 = decoder_dense1(decoder_combined_context)
decoder_dense2 = Timedistributed(Dense(vocab_size_fr,activation="softmax"))
output = decoder_dense2(dense1)


model = Model(inputs=[encoder_input,decoder_input],outputs=[output])

但是,为了进行预测,我必须实现一个推理函数,因此,将我的编码器与我的解码器分开。这是我的尝试:

encoder = encoder_emb(encoder_input)
encoding_lstm,state_c = encoder_lstm(encoder)


encoding_results = Input(shape=(1024))
decoder_state_input_h = Input(shape=(1024))
decoder_state_input_c = Input(shape=(1024))

decoder_states_inputs = [decoder_state_input_h,decoder_state_input_c]

decoder_embedded = decoder_emb(decoder_input)
decoder_outputs,state_c = decoder_lstm(decoder_embedded,initial_state=decoder_states_inputs)
decoder_states = [state_h,state_c]

attention = dot([decoder_outputs,encoding_results],2])
attention = Activation('softmax')(attention)

context = dot([attention,decoder_outputs])

decoder_outputs = decoder_dense1(decoder_combined_context)
decoder_outputs = decoder_dense2(decoder_outputs)

encoder_model = Model(encoder_input,[encoding_lstm,state_c])
decoder_model = Model([decoder_inputs,encoding_results] + decoder_states_inputs,[decoder_outputs] + decoder_states)

这似乎有问题,因为它返回给我:

Dimensions must be equal,but are 4096 and 1024 for '{{node mul/mul}} = Mul[T=DT_FLOAT](Sigmoid_1,init_c)' with input shapes: [?,75,4096],[?,1024].

谁能给我一些想法来帮助我理解这个问题?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。