微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

我在使用逻辑回归实现降雨预测分类模型时遇到错误

如何解决我在使用逻辑回归实现降雨预测分类模型时遇到错误

我的代码

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

dataset = pd.read_csv('weatherAUS.csv')
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values

imputer = SimpleImputer(missing_values=np.nan,strategy='mean')
imputer.fit(X[:,1:15])
X[:,1:15] = imputer.transform(X[:,1:15])
    
ct = ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[0])],remainder='passthrough')
X = np.array(ct.fit_transform(X))
    
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.25)

我遇到的错误

TypeError                                 Traceback (most recent call last)
<ipython-input-16-c8b4cceb3113> in <module>()
      1 from sklearn.model_selection import train_test_split
----> 2 X_train,test_size = 0.25)

4 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in _num_samples(x)
    150         if len(x.shape) == 0:
    151             raise TypeError("Singleton array %r cannot be considered"
--> 152                             " a valid collection." % x)
    153         # Check that shape is returning an integer or default to len
    154         # dask dataframes may not return numeric shape[0] value

TypeError: Singleton array array(<145460x63 sparse matrix of type '<class 'numpy.float64'>'
    with 1961771 stored elements in Compressed Sparse Row format>,dtype=object) cannot be considered a valid collection.

数据集比较大(140000 行,17 列)。第一列是澳大利亚某个地方的位置,因此我必须使用单热编码对其进行编码。有很多黑色单元格,所以我不得不清理数据。

指向我的数据集的链接here

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。