如何解决拉索回归之前,model.matrix中的预测变量数量膨胀
我正在尝试进行套索逻辑回归以将Y预测为双分类变量,并具有24个候选量化预测子。我有108个观察结果。 这是前三行的数据框外观:
> data.detect[1:3,]
CONVENTIONAL_HUmin CONVENTIONAL_HUmean CONVENTIONAL_HUstd CONVENTIONAL_HUmax
1 37.9400539686119 63.4903779286635 11.7592095845857 85.2375439991287
2 23.8400539686119 80.5903779286635 15.0592095845857 125.837543999129
3 19.3035945249441 73.2764716205565 12.8816244173147 130.24141901586
CONVENTIONAL_HUQ1 CONVENTIONAL_HUQ2 CONVENTIONAL_HUQ3 HISTO_Skewness HISTO_Kurtosis
1 54.9938390994964 65.4873070322704 72.8863025473031 -0.203420585259268 2.25208159159488
2 70.8938390994964 80.3873070322704 91.4863025473031 -0.117420585259268 2.91208159159488
3 64.4689755423307 73.8666609177099 81.7351818199415 -0.0908104900456161 2.8751327713366
HISTO_ExcessKurtosis HISTO_Entropy_log10 HISTO_Entropy_log2 HISTO_Energy...Uniformity.
1 -0.751917020142877 0.701345471328916 2.32782599847774 0.219781577333287
2 -0.0887170201428774 0.793345471328916 2.63782599847774 0.184781577333287
3 -0.127231561113029 0.738530858918985 2.45445652190669 0.206887426065656
GLZLM_SZE GLZLM_LZE GLZLM_LGZE GLZLM_HGZE GLZLM_SZLGE
1 0.366581916604228 35.7249100350856 8.7285612359045e-05 11497.6407737833 3.22615226279017e-05
2 0.693581916604228 984.424910035086 8.5685612359045e-05 11697.6407737833 5.98615226279017e-05
3 0.622711792823853 1103.10288991619 8.5573088970709e-05 11571.7421733917 5.33303855950858e-05
GLZLM_SZHGE GLZLM_LZLGE GLZLM_LZHGE GLZLM_GLNU GLZLM_ZLNU
1 4164.91570215061 0.00314512237564268 405585.990838764 2.66964898745512 2.47759091065361
2 8064.91570215061 0.0835651223756427 11581585.9908388 12.9796489874551 38.5375909106536
3 7295.45317481887 0.0949686480587339 12926109.9421091 15.0930512668698 37.6083347285291
GLZLM_ZP Y
1 0.219643444043173 1
2 0.112643444043173 0
3 0.104031438564764 0
我创建了一个模型矩阵和响应向量:
x=model.matrix(Y~.,data=data.detect)
y=data.detect$Y
但是在运行套索时
fit.lasso <- glmnet(x,y,family = "binomial",alpha = 1)
plot(fit.lasso,xvar = "lambda",label=T)
Navigating your First Steps Using Taurus
如您所见,我有24个以上的预测变量(上面的x轴中为299个)
通过交叉验证选择最佳lambda时
cv.lasso <- cv.glmnet(x,family="binomial",alpha=1,nfold=10)
plot(cv.lasso)
同一件事!
当尝试从最佳lambda模型中提取非零系数时:
coef(cv.lasso)
2266 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 1.098612
(Intercept) .
CONVENTIONAL_HUmin-10.5599460313881 .
CONVENTIONAL_HUmin-117.359946031388 .
CONVENTIONAL_HUmin-13.0599460313881 .
CONVENTIONAL_HUmin-154.359946031388 .
CONVENTIONAL_HUmin-17.6599460313881 .
CONVENTIONAL_HUmin-18.3599460313881 .
CONVENTIONAL_HUmin-2.87994603138811 .
CONVENTIONAL_HUmin-21.281710504529 .
CONVENTIONAL_HUmin-28.3599460313881 .
CONVENTIONAL_HUmin-3.44994603138811 .
CONVENTIONAL_HUmin-3.89640547505594 .
CONVENTIONAL_HUmin-67.0599460313881 .
CONVENTIONAL_HUmin-682.359946031388 .
CONVENTIONAL_HUmin-9.08171050452898 .
我查看了我的model.matrix,我认为这就是问题所在 例如,对于预测变量CONVENTIONAL_HUmax,它被重复了大约100次!
CONVENTIONAL_HUmax104.046018889744 CONVENTIONAL_HUmax104.837543999129
CONVENTIONAL_HUmax105.682510733322 CONVENTIONAL_HUmax105.837543999129
CONVENTIONAL_HUmax106.837543999129 CONVENTIONAL_HUmax107.173812970536
CONVENTIONAL_HUmax107.837543999129 CONVENTIONAL_HUmax108.046018889744
CONVENTIONAL_HUmax108.837543999129 CONVENTIONAL_HUmax109.837543999129
CONVENTIONAL_HUmax110.046018889744 CONVENTIONAL_HUmax110.837543999129
CONVENTIONAL_HUmax111.046018889744 CONVENTIONAL_HUmax112.837543999129
CONVENTIONAL_HUmax113.837543999129 CONVENTIONAL_HUmax114.837543999129
CONVENTIONAL_HUmax116.837543999129 CONVENTIONAL_HUmax117.837543999129
CONVENTIONAL_HUmax119.837543999129 CONVENTIONAL_HUmax122.837543999129
CONVENTIONAL_HUmax123.837543999129 CONVENTIONAL_HUmax124.046018889744
CONVENTIONAL_HUmax125.046018889744 CONVENTIONAL_HUmax125.837543999129
CONVENTIONAL_HUmax126.837543999129 CONVENTIONAL_HUmax127.837543999129
CONVENTIONAL_HUmax128.837543999129 CONVENTIONAL_HUmax129.837543999129
CONVENTIONAL_HUmax130.24141901586 CONVENTIONAL_HUmax130.837543999129
CONVENTIONAL_HUmax131.837543999129 CONVENTIONAL_HUmax133.837543999129
CONVENTIONAL_HUmax135.837543999129 CONVENTIONAL_HUmax140.046018889744
CONVENTIONAL_HUmax141.046018889744 CONVENTIONAL_HUmax143.046018889744
CONVENTIONAL_HUmax144.837543999129 CONVENTIONAL_HUmax149.837543999129
CONVENTIONAL_HUmax150.046018889744 CONVENTIONAL_HUmax150.837543999129
CONVENTIONAL_HUmax152.837543999129 CONVENTIONAL_HUmax153.837543999129
CONVENTIONAL_HUmax157.837543999129 CONVENTIONAL_HUmax159.046018889744
CONVENTIONAL_HUmax164.046018889744 CONVENTIONAL_HUmax168.046018889744
CONVENTIONAL_HUmax176.837543999129 CONVENTIONAL_HUmax60.5460188897443
CONVENTIONAL_HUmax61.0460188897443 CONVENTIONAL_HUmax69.3375439991287
CONVENTIONAL_HUmax73.6375439991287 CONVENTIONAL_HUmax74.2460188897443
CONVENTIONAL_HUmax76.0460188897443 CONVENTIONAL_HUmax80.4375439991287
CONVENTIONAL_HUmax82.7460188897443 CONVENTIONAL_HUmax82.8375439991287
CONVENTIONAL_HUmax83.2460188897443 CONVENTIONAL_HUmax85.2375439991287
CONVENTIONAL_HUmax87.5375439991287 CONVENTIONAL_HUmax88.0375439991287
CONVENTIONAL_HUmax89.0414190158595 CONVENTIONAL_HUmax89.2375439991287
CONVENTIONAL_HUmax91.6460188897443 CONVENTIONAL_HUmax91.9375439991287
CONVENTIONAL_HUmax92.9375439991287 CONVENTIONAL_HUmax93.1460188897443
CONVENTIONAL_HUmax93.6460188897443 CONVENTIONAL_HUmax93.9460188897443
CONVENTIONAL_HUmax94.0460188897443 CONVENTIONAL_HUmax94.1460188897443
CONVENTIONAL_HUmax94.2460188897443 CONVENTIONAL_HUmax95.2375439991287
CONVENTIONAL_HUmax96.3375439991287 CONVENTIONAL_HUmax96.5375439991287
CONVENTIONAL_HUmax96.8414190158595 CONVENTIONAL_HUmax97.1460188897443
CONVENTIONAL_HUmax97.2375439991287 CONVENTIONAL_HUmax99.5375439991287
如果您有任何解决此问题的建议,我将不胜感激!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。