具有平方特征的 Pytorch 线性回归

如何解决具有平方特征的 Pytorch 线性回归

我是 PyTorch 的新手，我想部分使用 PyTorch 实现线性回归，部分使用我自己。我想在回归中使用平方特征：

import torch

# init
x = torch.tensor([1,2,3,4,5])
y = torch.tensor([[1],[4],[9],[16],[25]])
w = torch.tensor([[0.5],[0.5],[0.5]],requires_grad=True)

iterations = 30
alpha = 0.01

def forward(X):
    # feature transformation [1,x,x^2]
    psi = torch.tensor([[1.0,x[0],x[0]**2]])
    for i in range(1,len(X)):
        psi = torch.cat((psi,torch.tensor([[1.0,x[i],x[i]**2]])),0)
    return torch.matmul(psi,w)
    
def loss(y,y_hat):
    return ((y-y_hat)**2).mean()

for i in range(iterations):
    
    y_hat = forward(x)

    l = loss(y,y_hat)
    l.backward()
    
    with torch.no_grad():
        w -= alpha * w.grad 
    w.grad.zero_()

    if i%10 == 0:
        print(f'Iteration {i}: The weight is:\n{w.detach().numpy()}\nThe loss is:{l}\n')

当我执行我的代码时，回归没有学习到正确的特征并且损失永久增加。输出如下：

Iteration 0: The weight is:
[[0.57 ]
 [0.81 ]
 [1.898]]
The loss is:25.450000762939453

Iteration 10: The weight is:
[[ 5529.5835]
 [22452.398 ]
 [97326.12  ]]
The loss is:210414632960.0

Iteration 20: The weight is:
[[5.0884394e+08]
 [2.0662339e+09]
 [8.9567642e+09]]
The loss is:1.7820802835250162e+21

有人知道为什么我的模型没有学习吗？

更新

它的表现如此糟糕有什么原因吗？我认为这是因为训练数据的数量很少。但也有 10 个数据点，它的表现并不好：

解决方法

您应该规范化您的数据。此外，由于您试图拟合 x -> ax² + bx + c，因此 c 本质上是偏差。更明智的做法是将其从训练数据中移除（我在这里指的是 psi）并为偏差使用单独的参数。

可以做什么：

使用均值和标准差对您的输入数据和目标进行标准化。
将参数分成w（一个双分量权重张量）和b（偏差）。
您不需要对每个推理都构造 psi，因为 x 是相同的。
您可以使用 psi 构建 torch.stack([torch.ones_like(x),x,x**2],1)，但在这里我们不需要那些，因为我们基本上已经从权重张量中分离了偏差。

这是它的样子：

x = torch.tensor([1,2,3,4,5]).float()
psi = torch.stack([x,1).float()
psi = (psi - psi.mean(0)) / psi.std(0)

y = torch.tensor([[1],[4],[9],[16],[25]]).float()
y = (y - y.mean(0)) / y.std(0)

w = torch.tensor([[0.5],[0.5]],requires_grad=True)
b = torch.tensor([0.5],requires_grad=True)

iterations = 30
alpha = 0.02
def loss(y,y_hat):
    return ((y-y_hat)**2).mean()

for i in range(iterations):
    y_hat = torch.matmul(psi,w) + b
    l = loss(y,y_hat)
    l.backward()
    
    with torch.no_grad():
        w -= alpha * w.grad 
        b -= alpha * b.grad 
    w.grad.zero_()
    b.grad.zero_()

    if i%10 == 0:
        print(f'Iteration {i}: The weight is:\n{w.detach().numpy()}\nThe loss is:{l}\n')

结果：

Iteration 0: The weight is:
[[0.49954653]
 [0.5004535 ]]
The loss is:0.25755801796913147

Iteration 10: The weight is:
[[0.49503425]
 [0.5049657 ]]
The loss is:0.07994867861270905

Iteration 20: The weight is:
[[0.49056274]
 [0.50943726]]
The loss is:0.028329044580459595