微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用梯度带自动区分张量流并将其合并到 keras NN

如何解决使用梯度带自动区分张量流并将其合并到 keras NN

我正在尝试在 tensorflow2 的 keras API 中构建一个神经网络。在这个 NN 中,我将时间变量作为输入,并将来自物理方程的两个向量称为“a”和“b”作为输出。我正在尝试构建一个 NN,它在其损失函数中包含物理方程,即计算向量 a 相对于时间的导数。为此,我使用梯度磁带 API 来自动区分 TensorFlow。

据我从文档中了解到,使用梯度带计算非标量目标的导数是通过雅可比方法完成的。但是,当我通过 10 的批量大小并计算雅可比时,我得到了关于所有 10 个时间值的导数,这不是我想要的。我只想计算呈现给网络的 10 个向量的每个向量“a”的导数,仅相对于相应的时间值。

代码如下,我在阶梯函数中训练网络。至于方程,您可以在附图中看到它们,只有向量“a”和“b”是变量,而所有其他项都是标量,例如 \nu 或矩阵和张量。The physical equations under question.

import sys

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import scipy.io
from scipy.interpolate import griddata
import time
from itertools import product,combinations
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import poly3DCollection
from mpl_toolkits.axes_grid1 import make_axes_locatable
from sklearn.preprocessing import MinMaxScaler
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation,Dense 
from tensorflow.keras.optimizers import Adam
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt

np.random.seed(1234)
tf.random.set_seed(1234)

神经网络类:

class MyModel:
    # Initialize the class
    def __init__(self,t,a,b,nu,M,B,C,K,P):
        # Initialize the values of the constants scalars,vectors and matrices involved in the          
        # equations,t is the input data matrix,while a and b are the output data matrices 
        self.t = t
        self.a = a
        self.b = b
        self.nu = nu
        self.M = M
        self.B = B
        self.C = C
        self.K = K
        self.P = P
        self.Nu = a.shape[1]
        self.Np = b.shape[1]
        
    def build_model(self):
        
        Nu = self.Nu
        Np = self.Np
        NoutputTotal = Nu + Np
        model1 = Sequential([
        Dense(units=100,input_shape=(1,),activation='sigmoid'),Dense(units=80,Dense(units=70,Dense(units=50,Dense(units=40,Dense(units=NoutputTotal,])

        return model1

    def step(self,y):
        
        Nu = self.Nu
        Np = self.Np
        a_tf = y[:,0:Nu]
        b_tf = y[:,Nu:]
        model = self.build_model()
        # I create a tensorflow variable in which I store the values of time in order to be able 
        # take the time derivatives
        x_in = tf.Variable(t)
        
        with tf.GradientTape() as tape:
            pred = model(x_in)
            a_pred = pred[:,0:Nu]
            b_pred = pred[:,Nu:]
        print(a_pred.shape)
        da_dt = tape.jacobian(a_pred,x_in)     
        print(da_dt.shape)
        # f_1 corresponds the first ODE equation while f_2 corresponds to the second matrix eq.
        f_1 = tf.tensordot(M,da_dt,1) - nu * tf.tensordot(B,a_pred,1) + tf.tensordot(tf.tensordot(a_pred.T,[[0],[1]]),1) + tf.tensordot(K,b_pred,1) 
        f_2 = tf.tensordot(P,1)
        # compute the loss as the MES of the data on "a" and "b" and also the one that comes
        # from the physical eqs described by f_1 and f_2            
        loss = tf.reduce_sum(tf.square(a_tf - a_pred)) + \
                    tf.reduce_sum(tf.square(b_tf - b_pred)) + \
                    tf.reduce_sum(tf.square(f_1)) + \
                    tf.reduce_sum(tf.square(f_2))

        grads = tape.gradient(loss,model.trainable_variables)
        opt.apply_gradients(zip(grads,model.trainable_variables))

我主要定义了时代数和批量大小。我使用阶跃函数来训练网络。但是,如前所述,当我尝试计算“a”的导数时遇到了问题。打印命令为我提供了 a_pred 的 (10,25) 形状(25 对应于“a”的维度)和 da_dt 的 (10,25,10,1) 形状,我期望 a_pred 的大小相同。值得一提的是,对于与需要导数的时间值不对应的条目,da_dt 的值为零。

if __name__ == "__main__": 
      
    EPOCHS = 20000
    BS = 10
    INIT_LR = 1e-4
    print("Loading the data")
    # Load Data
    exec(open('MatricesPostProcessing/coeffL2U_mat.py').read())
    exec(open('MatricesPostProcessing/coeffL2P_mat.py').read())
    exec(open('Matrices/C_mat.py').read())
    exec(open('Matrices/B_mat.py').read())
    exec(open('Matrices/K_mat.py').read())
    exec(open('Matrices/P_mat.py').read())
    exec(open('Matrices/M_mat.py').read())
    exec(open('tSnapshots_mat.py').read())
    t = tSnapshots
    T = t.shape[0]
    Nu = coeffL2U.shape[1]
    Np = coeffL2P.shape[1]
    # scale the data to the range of [0,1]
    scaler = MinMaxScaler(feature_range=(0,1))
    t_scaled = scaler.fit_transform(t.reshape(-1,1))
    coeffL2U_scaled = scaler.fit_transform(coeffL2U)
    coeffL2P_scaled = scaler.fit_transform(coeffL2P)
    a_and_b = np.concatenate([coeffL2U_scaled,coeffL2P_scaled],1)
    # Create a tensor tf version of each constant matrix and tensor appearing in the equations
    M_T = tf.constant(M)
    B_T = tf.constant(B)
    C_T = tf.constant(C)
    K_T = tf.constant(K)
    P_T = tf.constant(P)
    t_T = tf.constant(t_scaled)
    output_T = tf.constant(a_and_b)
    a_T = output_T[:,0:Nu]
    b_T = output_T[:,Nu:]
    nu = 1e-4
    # Training
    model = MyModel(t_T,a_T,b_T,M_T,B_T,C_T,K_T,P_T)
    # compute the number of batch updates per epoch
    numUpdates = int(t_T.shape[0] / BS)
    # loop over the number of epochs
    for epoch in range(0,EPOCHS):
        # show the current epoch number
        print("[INFO] starting epoch {}/{}...".format(
            epoch + 1,EPOCHS),end="")
        sys.stdout.flush()
        epochStart = time.time()
        # loop over the data in batch size increments
        for i in range(0,numUpdates):
            # determine starting and ending slice indexes for the current
            # batch
            start = i * BS
            end = start + BS
            # take a step
            model.step(t_T[start:end],output_T[start:end])
        # show timing information for the epoch
        epochEnd = time.time()
        elapsed = (epochEnd - epochStart) / 60.0
        print("took {:.4} minutes".format(elapsed))

对构建此类神经网络有任何更正或有用的评论吗?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?