微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用 sklearn 中的 train_test_split 错误拆分数据

如何解决使用 sklearn 中的 train_test_split 错误拆分数据

不知道如何解决这个问题,请帮忙。代码如下:

 from sklearn.feature_extraction.text import CountVectorizer
all_features = vectorizer.fit_transform(df['text'].values.astype('U'))
vectorizer.vocabulary_
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(vectorizer,df['intent'],test_size=0.3,random_state=88)

下面是错误

    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-5cd659a5da4e> in <module>
----> 1 X_train,random_state=88)

~\Anaconda3\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(*arrays,**options)
   2125         raise TypeError("Invalid parameters passed: %s" % str(options))
   2126 
-> 2127     arrays = indexable(*arrays)
   2128 
   2129     n_samples = _num_samples(arrays[0])

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
    290     """
    291     result = [_make_indexable(X) for X in iterables]
--> 292     check_consistent_length(*result)
    293     return result
    294 

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    250     """
    251 
--> 252     lengths = [_num_samples(X) for X in arrays if X is not None]
    253     uniques = np.unique(lengths)
    254     if len(uniques) > 1:

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in <listcomp>(.0)
    250     """
    251 
--> 252     lengths = [_num_samples(X) for X in arrays if X is not None]
    253     uniques = np.unique(lengths)
    254     if len(uniques) > 1:

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in _num_samples(x)
    193     if hasattr(x,'shape') and x.shape is not None:
    194         if len(x.shape) == 0:
--> 195             raise TypeError("Singleton array %r cannot be considered"
    196                             " a valid collection." % x)
    197         # Check that shape is returning an integer or default to len

TypeError: Singleton array array(CountVectorizer(stop_words='english'),dtype=object) cannot be considered a valid collection.

请帮我解决这个错误。我正在学习 here 的教程。我曾尝试从上述代码中查找错误,但似乎无法发现错误

解决方法

使用 all_features 而不是 vectorizer

X_train,X_test,y_train,y_test = train_test_split(all_features,df['intent'],test_size=0.3,random_state=88)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。