微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

缺少值的自动编码器Keras,Python

如何解决缺少值的自动编码器Keras,Python

我正在尝试为以下数据集构建一个简单的线性自动编码器模型:

permno         10001     10002     10012  ...     93434  93435     93436
date                                      ...                           
2005-01-31 -0.040580 -0.132466 -0.320388  ...       NaN    NaN       NaN
2005-02-28 -0.045166 -0.033255 -0.285714  ...       NaN    NaN       NaN
2005-03-31  0.124822 -0.013081 -0.080000  ...       NaN    NaN       NaN
2005-04-29 -0.074684 -0.066700 -0.195652  ...       NaN    NaN       NaN
2005-05-31  0.219030  0.046586 -0.243243  ...       NaN    NaN       NaN
             ...       ...       ...  ...       ...    ...       ...
2019-08-30       NaN       NaN       NaN  ... -0.104730    NaN -0.066222
2019-09-30       NaN       NaN       NaN  ... -0.101887    NaN  0.067639
2019-10-31       NaN       NaN       NaN  ... -0.050420    NaN  0.307427
2019-11-29       NaN       NaN       NaN  ...  0.000000    NaN  0.047695
2019-12-31       NaN       NaN       NaN  ... -0.070796    NaN  0.267897

我定义了一个构建自动编码器的函数

from keras import layers
from keras.layers import Input,Dense
from keras import regularizers,models,optimizers
import matplotlib.pyplot as plt
from datetime import datetime
import seaborn as sns

def lin_ae(object,neurons,learning_rate = 1e-4,regularization = 1e-5,epochs=1000):
    
    input_layer = Input(shape=(object.shape[1],))
    
    encoder = Dense(neurons,activation='linear',kernel_regularizer=regularizers.l2(regularization))(input_layer)
    decoder = Dense(object.shape[1],kernel_regularizer=regularizers.l2(regularization))(encoder)
    autoencoder = models.Model(input_layer,decoder)
    autoencoder.compile(optimizer=optimizers.adam(lr=learning_rate),loss='mean_squared_error')
    
    #Fit the model
    autoencoder.fit(object,object,epochs=epochs,batch_size=4,shuffle=True)
    return autoencoder

如您所见,我的数据帧非常不平衡。当我尝试使用lin_ae(object=df.values,neurons=5,regularization=0,epochs=1000)运行上面定义的函数时 我看到损失函数缺少值:

...
Epoch 103/1000
180/180 [==============================] - 0s 756us/step - loss: nan
Epoch 104/1000
180/180 [==============================] - 0s 756us/step - loss: nan
...

那我该怎么办?我当然可以丢弃缺少值的观察值,但是我想保持时间和permno(列)的尺寸很大。我该如何告诉keras函数一次“忽略”缺失的值,而不是为了摆脱NaN而不得不丢弃整个数据帧的行或列?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。