为什么 LogisticRegression 和 MLPClassifier 不会产生相同的结果？

如何解决为什么 LogisticRegression 和 MLPClassifier 不会产生相同的结果？

没有隐藏层和 sigmoid/softmax 激活的神经网络只是逻辑回归：

from sklearn.datasets import load_iris
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
X,y = load_iris(return_X_y=True)
nn = MLPClassifier(hidden_layer_sizes=(),solver = 'lbfgs',activation='logistic',alpha = 0).fit(X,y)
l = LogisticRegression(penalty='none',fit_intercept = False).fit(X,y)

那么为什么这两个模型不产生相同的系数呢？他们中的大多数都很接近，但也有一些差异：

print("NN")
print(nn.coefs_[0].T)
print("\nLogistic")
print(l.coef_)
NN
[[  5.40104629  11.39328515 -16.50698752  -7.86329804]
 [ -1.06741383  -2.48638863   3.37921506  -5.29842503]
 [ -3.55724865  -9.11027371  12.79749019  12.9357708 ]]

Logistic
[[  5.10297361  11.87381176 -16.50600209  -7.70449685]
 [  0.61357365  -2.6277241    4.03442742  -1.28869255]
 [ -5.71654726  -9.24608766  12.47157468   8.9931894 ]]

解决方法

您的比较存在一些问题，但纠正它们并不能解决问题；所以，这只是部分答案。

首先，MLP 分类器默认包含一个偏差（拦截）节点（与 LR 不同，该节点的存在不可定制），因此您需要在 LR 中使用 fit_intercept = True。

其次，尽管两个模型中的求解器相同，但 max_iter 的默认值不同，因此我们应该将它们设置为相等。

第三，为了使问题尽可能简单，最好将讨论保持在二元分类设置中，而不是多类设置中。

这是您按照上述修改的代码：

from sklearn.datasets import load_iris
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.utils import shuffle

X,y = load_iris(return_X_y=True)

X,y = shuffle(X[:100,],y[:100],random_state=42) # keep only classes 0/1 (binary problem)

nn = MLPClassifier(hidden_layer_sizes=(),solver = 'lbfgs',activation='logistic',alpha = 0,max_iter=100).fit(X,y)
l = LogisticRegression(penalty='none',fit_intercept = True).fit(X,y)

print("NN coefficients & intercept")
print(nn.coefs_[0].T)
print(nn.intercepts_)
print("\nLR coefficients & intercept")
print(l.coef_)
print(l.intercept_)

结果：

NN coefficients & intercept
[[-1.34230329 -4.29615611  7.14868389  2.66752688]]
[array([-0.90035086])]

LR coefficients & intercept
[[-2.07247339 -6.90694692 10.97006745  5.64543091]]
[-1.05932537]

事情是，如果你多次运行上面的代码（我没有设置任何随机状态，除了数据混洗的那个），你会看到，虽然每次 LR 结果都相同，但 MLP结果因运行而异。这是另一个演示和量化这一点的简短实验：

nn_coef = []
nn_intercept = []
lr_coef = []
lr_inter = []

for i in range(0,20):
  nn = MLPClassifier(hidden_layer_sizes=(),y)
  l = LogisticRegression(penalty='none',y)

  nn_coef.append(nn.coefs_[0].T)
  nn_intercept.append(nn.intercepts_)
  lr_coef.append(l.coef_)
  lr_inter.append(l.intercept_)

import numpy as np

# get the standard deviations of coefficients & intercepts between runs:

print(np.std(nn_coef,axis=0))
print(np.std(lr_coef,axis=0))
print()
print(np.std(nn_intercept))
print(np.std(lr_inter))

结果：

[[0.14334883 0.42125216 0.46115555 0.4488226 ]]
[[0.00000000e+00 8.88178420e-16 1.77635684e-15 8.88178420e-16]]

0.3393994986547498
0.0

因此，很明显，虽然 LR 系数和截距的标准偏差实际上为零，但 MLP 参数的相应标准偏差确实相当大。

似乎 MLP 算法，至少在 L-BFGS 求解器中，对权重和偏差的初始化非常，而 LR 则不是这种情况。这似乎也是相关 Github thread 中的隐含假设。但我同意你的含蓄期望，不应该是这样。

如果没有其他人提出满意的答案，我想这是打开 Github 问题的一个很好的候选案例。

正如@desertnaut 所指出的，MLP 初始化似乎确实是个问题，因为 MLP 和 LR 系数之间的差异似乎随着样本量的增加而减小。

from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification

random_state = 100
n_samples = 1000

X,y = make_classification(n_samples=n_samples,n_features=2,n_redundant=0,n_informative=2,n_clusters_per_class=1,random_state=random_state)
X = StandardScaler().fit_transform(X)

nn = MLPClassifier(hidden_layer_sizes=(),solver='lbfgs',alpha=0,max_iter=1000,tol=0,random_state=random_state).fit(X,y)
lr = LogisticRegression(penalty='none',fit_intercept=True,y)

print(nn.intercepts_[0])
print(lr.intercept_)
# [-1.08397244]
# [-1.08397505]

print(nn.coefs_[0].T)
print(lr.coef_)
# [[ 2.90716947 -3.08525711]]
# [[ 2.90718263 -3.08525826]]

下面的代码显示，随着样本量的增加，MLP 系数的方差会减小，并且 MLP 系数和 LR 系数都收敛到真实系数，即使确切的截止点取决于数据集。

import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# sample sizes
n_samples = [25,50,75,100,250,500,750,1000,5000,10000]

# number of refits of the MLP and LR
# models for each sample size
n_repetitions = 100

# synthetic data
true_intercept = 10
true_weights = [20,30]
X = np.random.multivariate_normal(np.zeros(2),np.eye(2),np.max(n_samples))
Z = true_intercept + np.dot(X,true_weights) + np.random.normal(0,1,np.max(n_samples))
p = 1 / (1 + np.exp(- Z))
y = np.random.binomial(1,p,np.max(n_samples))

# data frame for storing the results for each sample size
output = pd.DataFrame(columns=['sample size','label avg.','LR intercept avg.','LR intercept std.','NN intercept avg.','NN intercept std.','LR first weight avg.','LR first weight std.','NN first weight avg.','NN first weight std.','LR second weight avg.','LR second weight std.','NN second weight avg.','NN second weight std.'])

# loop across the different
# sample sizes "n"
for n in n_samples:

    lr_intercept,lr_coef = [],[]
    nn_intercept,nn_coef = [],[]

    # refit the MLP and LR models multiple times
    # using the first "n" samples
    for k in range(n_repetitions):

        nn = MLPClassifier(hidden_layer_sizes=(),tol=0)
        lr = LogisticRegression(penalty='none',tol=0)

        nn.fit(X[:n,:],y[:n])
        lr.fit(X[:n,y[:n])

        lr_intercept.append(lr.intercept_)
        nn_intercept.append(nn.intercepts_[0])

        lr_coef.append(lr.coef_)
        nn_coef.append(nn.coefs_[0].T)

    # save the sample mean and sample standard deviations
    # of the MLP and LR estimated coefficients for the
    # considered sample size "n"
    output = output.append(pd.DataFrame({
        'sample size': [n],'label avg.': [np.mean(y[:n])],'LR intercept avg.': [np.mean(lr_intercept)],'LR intercept std.': [np.std(lr_intercept,ddof=1)],'NN intercept avg.': [np.mean(nn_intercept)],'NN intercept std.': [np.std(nn_intercept,'LR first weight avg.': [np.mean(lr_coef,axis=0)[0][0]],'LR first weight std.': [np.std(lr_coef,ddof=1,'NN first weight avg.': [np.mean(nn_coef,'NN first weight std.': [np.std(nn_coef,'LR second weight avg.': [np.mean(lr_coef,axis=0)[0][1]],'LR second weight std.': [np.std(lr_coef,'NN second weight avg.': [np.mean(nn_coef,'NN second weight std.': [np.std(nn_coef,}),ignore_index=True)

# plot the results
fig = make_subplots(rows=3,cols=1,subplot_titles=['Intercept','First Weight','Second Weight'])

fig.add_trace(go.Scatter(
    x=output['sample size'],y=[true_intercept] * output.shape[0],mode='lines',line=dict(color='rgb(82,188,163)',dash='dot',width=1),legendgroup='True Value',name='True Value',showlegend=True,),row=1,col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],y=output['LR intercept avg.'] + output['LR intercept std.'],line=dict(color='rgba(229,134,6,0.2)'),legendgroup='Logistic Regression',showlegend=False,y=output['LR intercept avg.'] - output['LR intercept std.'],fill='tonexty',fillcolor='rgba(229,0.2)',y=output['LR intercept avg.'],line=dict(color='rgb(229,6)',name='Logistic Regression',y=output['NN intercept avg.'] + output['NN intercept std.'],line=dict(color='rgba(93,105,177,y=output['NN intercept avg.'] - output['NN intercept std.'],fillcolor='rgba(93,y=output['NN intercept avg.'],line=dict(color='rgb(93,177)',legendgroup='MLP Regression',name='MLP Regression',col=1)

fig.update_xaxes(
    title='Sample Size',type='category',mirror=True,linecolor='#d9d9d9',showgrid=False,zeroline=False,col=1
)

fig.update_yaxes(
    title='Estimate',col=1
)

fig.add_trace(go.Scatter(
    x=output['sample size'],y=[true_weights[0]] * output.shape[0],row=2,y=output['LR first weight avg.'] + output['LR first weight std.'],y=output['LR first weight avg.'] - output['LR first weight std.'],y=output['LR first weight avg.'],y=output['NN first weight avg.'] + output['NN first weight std.'],y=output['NN first weight avg.'] - output['NN first weight std.'],y=output['NN first weight avg.'],y=[true_weights[1]] * output.shape[0],row=3,y=output['LR second weight avg.'] + output['LR second weight std.'],y=output['LR second weight avg.'] - output['LR second weight std.'],y=output['LR second weight avg.'],y=output['NN second weight avg.'] + output['NN second weight std.'],y=output['NN second weight avg.'] - output['NN second weight std.'],y=output['NN second weight avg.'],col=1
)

fig.update_layout(
    plot_bgcolor='white',paper_bgcolor='white',legend=dict(x=0,y=1.125,orientation='h'),font=dict(family='Arial',size=6),margin=dict(t=40,l=20,r=20,b=20)
)

fig.update_annotations(
    font=dict(family='Arial',size=8)
)

# fig.write_image('LR_MLP_comparison.png',engine='orca',scale=4,height=500,width=400)
fig.write_image('LR_MLP_comparison.png',engine='kaleido',width=400)

为什么 LogisticRegression 和 MLPClassifier 不会产生相同的结果？

如何解决为什么 LogisticRegression 和 MLPClassifier 不会产生相同的结果？

解决方法

相关推荐