我试图在scikit-learn的DictVectorizer返回的Scipy稀疏矩阵上计算最近邻居聚类.但是,当我尝试使用scikit-learn计算距离矩阵时,我通过pairwise.euclidean_distances和pairwise.pairwise_distances使用’euclidean’距离得到错误消息.我的印象是scikit-learn可以计算这些距离矩阵.
我的矩阵非常稀疏,形状为:< 364402x223209稀疏矩阵类型< class'numpy.float64'>
使用压缩稀疏行格式的728804存储元素>.
我也在Scipy中尝试了诸如pdist和kdtree之类的方法,但是还收到了其他无法处理结果的错误.
任何人都可以请我指出一个有效地允许我计算距离矩阵和/或最近邻结果的解决方案吗?
一些示例代码:
import numpy as np
from sklearn.feature_extraction import DictVectorizer
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics import pairwise
import scipy.spatial
file = 'FileLocation'
data = []
FILE = open(file,'r')
for line in FILE:
templine = line.strip().split(',')
data.append({'user':str(int(templine[0])),str(int(templine[1])):int(templine[2])})
FILE.close()
vec = DictVectorizer()
X = vec.fit_transform(data)
result = scipy.spatial.KDTree(X)
错误:
Traceback (most recent call last):
File "python3.2/site-packages/scipy/spatial/kdtree.py",line 227,in __init__
self.n,self.m = np.shape(self.data)
ValueError: need more than 0 values to unpack
同样,如果我跑:
scipy.spatial.distance.pdist(X,'euclidean')
我得到以下内容:
Traceback (most recent call last):
File "python3.2/site-packages/scipy/spatial/distance.py",line 1169,in pdist
[X] = _copy_arrays_if_base_present([_convert_to_double(X)])
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/scipy/spatial/distance.py",line 113,in _convert_to_double
X = X.astype(np.double)
ValueError: setting an array element with a sequence.
最后,在scikit-learn中运行NearestNeighbor会导致内存错误,使用:
nbrs = NearestNeighbors(n_neighbors=10,algorithm='brute')
>>> X
<2x3 sparse matrix of type 'pressed Sparse Row format>
>>> scipy.spatial.KDTree(X.todense())
distance.pdist(X.todense(),'euclidean')
array([ 6.55743852])
第二,从the docs:
Efficient brute-force neighbors searches can be very competitive for small data samples. However,as the number of samples N grows,the brute-force approach quickly becomes infeasible.
您可能想尝试’ball_tree’算法并查看它是否可以处理您的数据.
原文地址:https://www.jb51.cc/python/439703.html
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。