微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

拉索回归之前,model.matrix中的预测变量数量膨胀

如何解决拉索回归之前,model.matrix中的预测变量数量膨胀

我正在尝试进行套索逻辑回归以将Y预测为双分类变量,并具有24个候选量化预测子。我有108个观察结果。 这是前三行的数据框外观:

> data.detect[1:3,]
  CONVENTIONAL_HUmin CONVENTIONAL_HUmean CONVENTIONAL_HUstd CONVENTIONAL_HUmax
1   37.9400539686119    63.4903779286635   11.7592095845857   85.2375439991287
2   23.8400539686119    80.5903779286635   15.0592095845857   125.837543999129
3   19.3035945249441    73.2764716205565   12.8816244173147    130.24141901586
  CONVENTIONAL_HUQ1 CONVENTIONAL_HUQ2 CONVENTIONAL_HUQ3      HISTO_Skewness   HISTO_Kurtosis
1  54.9938390994964  65.4873070322704  72.8863025473031  -0.203420585259268 2.25208159159488
2  70.8938390994964  80.3873070322704  91.4863025473031  -0.117420585259268 2.91208159159488
3  64.4689755423307  73.8666609177099  81.7351818199415 -0.0908104900456161  2.8751327713366
  HISTO_ExcessKurtosis HISTO_Entropy_log10 HISTO_Entropy_log2 HISTO_Energy...Uniformity.
1   -0.751917020142877   0.701345471328916   2.32782599847774          0.219781577333287
2  -0.0887170201428774   0.793345471328916   2.63782599847774          0.184781577333287
3   -0.127231561113029   0.738530858918985   2.45445652190669          0.206887426065656
          GLZLM_SZE        GLZLM_LZE          GLZLM_LGZE       GLZLM_HGZE          GLZLM_SZLGE
1 0.366581916604228 35.7249100350856 8.7285612359045e-05 11497.6407737833 3.22615226279017e-05
2 0.693581916604228 984.424910035086 8.5685612359045e-05 11697.6407737833 5.98615226279017e-05
3 0.622711792823853 1103.10288991619 8.5573088970709e-05 11571.7421733917 5.33303855950858e-05
       GLZLM_SZHGE         GLZLM_LZLGE      GLZLM_LZHGE       GLZLM_GLNU       GLZLM_ZLNU
1 4164.91570215061 0.00314512237564268 405585.990838764 2.66964898745512 2.47759091065361
2 8064.91570215061  0.0835651223756427 11581585.9908388 12.9796489874551 38.5375909106536
3 7295.45317481887  0.0949686480587339 12926109.9421091 15.0930512668698 37.6083347285291
           GLZLM_ZP Y
1 0.219643444043173 1
2 0.112643444043173 0
3 0.104031438564764 0

我创建了一个模型矩阵和响应向量:

x=model.matrix(Y~.,data=data.detect)
y=data.detect$Y

但是在运行套索时

fit.lasso <- glmnet(x,y,family = "binomial",alpha = 1)
plot(fit.lasso,xvar = "lambda",label=T)

Navigating your First Steps Using Taurus

如您所见,我有24个以上的预测变量(上面的x轴中为299个)

通过交叉验证选择最佳lambda时

cv.lasso <- cv.glmnet(x,family="binomial",alpha=1,nfold=10)
plot(cv.lasso)

enter image description here

同一件事!

当尝试从最佳lambda模型中提取非零系数时:

coef(cv.lasso)
2266 x 1 sparse Matrix of class "dgCMatrix"
                                           1
(Intercept)                         1.098612
(Intercept)                         .       
CONVENTIONAL_HUmin-10.5599460313881 .       
CONVENTIONAL_HUmin-117.359946031388 .       
CONVENTIONAL_HUmin-13.0599460313881 .       
CONVENTIONAL_HUmin-154.359946031388 .       
CONVENTIONAL_HUmin-17.6599460313881 .       
CONVENTIONAL_HUmin-18.3599460313881 .       
CONVENTIONAL_HUmin-2.87994603138811 .       
CONVENTIONAL_HUmin-21.281710504529  .       
CONVENTIONAL_HUmin-28.3599460313881 .       
CONVENTIONAL_HUmin-3.44994603138811 .       
CONVENTIONAL_HUmin-3.89640547505594 .       
CONVENTIONAL_HUmin-67.0599460313881 .       
CONVENTIONAL_HUmin-682.359946031388 .       
CONVENTIONAL_HUmin-9.08171050452898 .   

我查看了我的model.matrix,我认为这就是问题所在 例如,对于预测变量CONVENTIONAL_HUmax,它被重复了大约100次!

  CONVENTIONAL_HUmax104.046018889744 CONVENTIONAL_HUmax104.837543999129
    CONVENTIONAL_HUmax105.682510733322 CONVENTIONAL_HUmax105.837543999129
    CONVENTIONAL_HUmax106.837543999129 CONVENTIONAL_HUmax107.173812970536
    CONVENTIONAL_HUmax107.837543999129 CONVENTIONAL_HUmax108.046018889744
    CONVENTIONAL_HUmax108.837543999129 CONVENTIONAL_HUmax109.837543999129
    CONVENTIONAL_HUmax110.046018889744 CONVENTIONAL_HUmax110.837543999129
    CONVENTIONAL_HUmax111.046018889744 CONVENTIONAL_HUmax112.837543999129
    CONVENTIONAL_HUmax113.837543999129 CONVENTIONAL_HUmax114.837543999129
    CONVENTIONAL_HUmax116.837543999129 CONVENTIONAL_HUmax117.837543999129
    CONVENTIONAL_HUmax119.837543999129 CONVENTIONAL_HUmax122.837543999129
    CONVENTIONAL_HUmax123.837543999129 CONVENTIONAL_HUmax124.046018889744
    CONVENTIONAL_HUmax125.046018889744 CONVENTIONAL_HUmax125.837543999129
    CONVENTIONAL_HUmax126.837543999129 CONVENTIONAL_HUmax127.837543999129
    CONVENTIONAL_HUmax128.837543999129 CONVENTIONAL_HUmax129.837543999129
    CONVENTIONAL_HUmax130.24141901586 CONVENTIONAL_HUmax130.837543999129
    CONVENTIONAL_HUmax131.837543999129 CONVENTIONAL_HUmax133.837543999129
    CONVENTIONAL_HUmax135.837543999129 CONVENTIONAL_HUmax140.046018889744
    CONVENTIONAL_HUmax141.046018889744 CONVENTIONAL_HUmax143.046018889744
    CONVENTIONAL_HUmax144.837543999129 CONVENTIONAL_HUmax149.837543999129
    CONVENTIONAL_HUmax150.046018889744 CONVENTIONAL_HUmax150.837543999129
    CONVENTIONAL_HUmax152.837543999129 CONVENTIONAL_HUmax153.837543999129
    CONVENTIONAL_HUmax157.837543999129 CONVENTIONAL_HUmax159.046018889744
    CONVENTIONAL_HUmax164.046018889744 CONVENTIONAL_HUmax168.046018889744
    CONVENTIONAL_HUmax176.837543999129 CONVENTIONAL_HUmax60.5460188897443
    CONVENTIONAL_HUmax61.0460188897443 CONVENTIONAL_HUmax69.3375439991287
    CONVENTIONAL_HUmax73.6375439991287 CONVENTIONAL_HUmax74.2460188897443
    CONVENTIONAL_HUmax76.0460188897443 CONVENTIONAL_HUmax80.4375439991287
    CONVENTIONAL_HUmax82.7460188897443 CONVENTIONAL_HUmax82.8375439991287
    CONVENTIONAL_HUmax83.2460188897443 CONVENTIONAL_HUmax85.2375439991287
    CONVENTIONAL_HUmax87.5375439991287 CONVENTIONAL_HUmax88.0375439991287
    CONVENTIONAL_HUmax89.0414190158595 CONVENTIONAL_HUmax89.2375439991287
    CONVENTIONAL_HUmax91.6460188897443 CONVENTIONAL_HUmax91.9375439991287
    CONVENTIONAL_HUmax92.9375439991287 CONVENTIONAL_HUmax93.1460188897443
    CONVENTIONAL_HUmax93.6460188897443 CONVENTIONAL_HUmax93.9460188897443
    CONVENTIONAL_HUmax94.0460188897443 CONVENTIONAL_HUmax94.1460188897443
    CONVENTIONAL_HUmax94.2460188897443 CONVENTIONAL_HUmax95.2375439991287
    CONVENTIONAL_HUmax96.3375439991287 CONVENTIONAL_HUmax96.5375439991287
    CONVENTIONAL_HUmax96.8414190158595 CONVENTIONAL_HUmax97.1460188897443
    CONVENTIONAL_HUmax97.2375439991287 CONVENTIONAL_HUmax99.5375439991287

如果您有任何解决此问题的建议,我将不胜感激!

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。