如何解决Python中的有效邻近距离矩阵
我需要一种内存和时间高效的方法来计算 1 到 10 维中大约 50000 个点之间的距离,在 Python 中。到目前为止,我尝试的方法都不是很好;到目前为止,我尝试过:
令我惊讶的是,sparse_distance_matrix
的表现非常糟糕。我使用的示例是从单位 5 维球中统一选择 5000 个点,其中 pdist
在 0.113 秒内返回结果,sparse_distance_matrix
在 44.966 秒内返回结果,当我使用它时最大距离截止的阈值 0.1。
此时,我会坚持使用 pdist
,但如果有 50000 点,它将使用 2.5 x 10^9 条目的 numpy 数组,我担心它是否会导致运行时过载( ?) 记忆。
有谁知道更好的方法,或者在我的实现中看到一个明显的错误?提前致谢!
import numpy as np
import math
import time
from scipy.spatial.distance import pdist
from scipy.spatial import KDTree as kdtree
# Generate a uniform sample of size N on the unit dim-dimensional sphere (which lives in dim+1 dimensions)
def sphere(N,dim):
# Get a random sample of points from the (dim+1)-dim. Gaussian.
output = np.random.multivariate_normal(mean=np.zeros(dim+1),cov=np.identity(dim+1),size=N)
# normalize output
output = output / np.linalg.norm(output,axis=1).reshape(-1,1)
return output
# Generate a uniform sample of size N on the unit dim-dimensional ball.
def ball(N,dim):
# Populate the points on the unit sphere that is the boundary.
sphere_points = sphere(N,dim-1)
# Randomize radii of the points on the sphere using power law to get a uniform distribution on the ball.
radii = np.power(np.random.random(N),1/dim)
output = radii.reshape(-1,1) * sphere_points
return output
N = 5000
dim = 5
r_cutoff = 0.1
# Generate a sample to test
sample = ball(N,dim)
# Construct a KD Tree for the sample
sample_kdt = kdtree(sample)
# pdist method for distance matrix
tic = time.monotonic()
pdist(sample)
toc = time.monotonic()
print(f"Time taken from pdist = {toc-tic}")
# KD Tree method for distance matrix
tic = time.monotonic()
sample_kdt.sparse_distance_matrix(sample_kdt,r_cutoff)
toc = time.monotonic()
print(f"Time taken from the KDTree method = {toc-tic}")
解决方法
import numpy as np
from sklearn.neighbors import BallTree
tic = time.monotonic()
tree = BallTree(sample,leaf_size=10)
d,i = tree.query(sample,k=1)
toc = time.monotonic()
print(f"Time taken from Sklearn BallTree = {toc-tic}")
这个在我的机器上做了 Time taken from Sklearn BallTree = 0.30803330009803176
。 pdist
只做了一秒钟多一点。 注意:我正在做一些繁重的计算,我的机器上有 3/4 个内核。
那个取最近的 k=1
对于半径 0.1
import numpy as np
from sklearn.neighbors import BallTree
tic = time.monotonic()
tree = BallTree(sample,leaf_size=10)
i = tree.query_radius(sample,r=0.1)
toc = time.monotonic()
print(f"Time taken from Sklearn BallTree Radius = {toc-tic}")
速度快
Time taken from Sklearn BallTree Radius = 0.11115029989741743
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。