Python PCA 实现 编辑

如何解决Python PCA 实现 编辑

我正在完成一项作业,我的任务是在 Python 中为在线课程实施 PCA。不幸的是,当我尝试在我的实施和 SKLearn 之间进行比较(由课程提供)时,我的结果似乎相差太大。

经过数小时的审查,我仍然不确定哪里出了问题。如果有人可以查看并确定我编码或解释错误的步骤,我将不胜感激。

def normalize(X):
    """
    Normalize the given dataset X to have zero mean.

    Args:
        X: ndarray,dataset of shape (N,D)
    Returns:
        (Xbar,mean): tuple of ndarray,Xbar is the normalized dataset
        with mean 0; mean is the sample mean of the dataset.

    Note: 
        You will encounter dimensions where the standard deviation is zero.

        For those ones,the process of normalization results in normalized data with NaN entries.  

        We can handle this by setting the std = 1 for those dimensions when doing normalization.  
    """
    # YOUR CODE HERE
    ### Uncomment and modify the code below
    mu = np.mean(X,axis = 0) # Setting axis = 0 will compute means column-wise.  Setting it to 1 will compute the mean across rows.  
    std = np.std(X,axis = 0) # Computing the std dev column wise using axis = 0.  
    std_filled = std.copy() 
    std_filled[std == 0] = 1
    # Compute the normalized data as Xbar 
    Xbar = (X - mu)/std_filled
    return Xbar,mu,# std_filled

def eig(S):
    """
    Compute the eigenvalues and corresponding unit eigenvectors for the covariance matrix S.

    Args:
        S: ndarray,covariance matrix

    Returns:
        (eigvals,eigvecs): ndarray,the eigenvalues and eigenvectors

    Note:
        the eigenvals and eigenvecs should be sorted in descending
        order of the eigen values
    """
    # YOUR CODE HERE
    # Uncomment and modify the code below
    # Compute the eigenvalues and eigenvectors
    # You can use library routines in `np.linalg.*` https://numpy.org/doc/stable/reference/routines.linalg.html for this
    eigvals,eigvecs = np.linalg.eig(S)
    # The eigenvalues and eigenvectors need to be sorted in descending order according to the eigenvalues
    # We will use `np.argsort` (https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html) to find a permutation of the indices
    # of eigvals that will sort eigvals in ascending order and then find the descending order via [::-1],which reverse the indices
    sort_indices = np.argsort(eigvals)[::-1]
    # Notice that we are sorting the columns (not rows) of eigvecs since the columns represent the eigenvectors.
    return eigvals[sort_indices],eigvecs[:,sort_indices]


def projection_matrix(B):
    """Compute the projection matrix onto the space spanned by the columns of `B`
    Args:
        B: ndarray of dimension (D,M),the basis for the subspace

    Returns:
        P: the projection matrix
    """
    # YOUR CODE HERE
    P = B @ (np.linalg.inv(B.T @ B)) @ B.T
    return P

def select_components(eig_vals,eig_vecs,num_components):
    """ 
    Selects the n components desired for projecting the data upon.  

    Args:
        eig_vals: The eigenvalues sorted in descending order of magnitude. 
        eig_vecs:  The eigenvectors sorted in order relative to that of the eigenvalues.
        num_components: the number of principal components to use.  
    Returns: 
        The number of desired components to keep for projection of the data upon. 
    """
    principal_vals,principal_components = eig_vals[:num_components],eig_vecs[:,range(num_components)]

    return principal_vals,principal_components


def PCA(X,num_components):
    """
    Projects normalized data onto the 'n' desired principal components.

    Args:
        X: ndarray of size (N,D),where D is the dimension of the data,and N is the number of datapoints
        num_components: the number of principal components to use.
    Returns:
        the reconstructed data,the sample mean of the X,principal values
        and principal components
    """
    # Normalize to have mean 0 and variance 1.
    Z,mean_vec = normalize(X) 
    # Calculate the covariance matrix 
    S = np.cov(Z,rowvar=False,bias=True) # Set rowvar = False to treat columns as variables.  Set bias = True to ensure normalization is done with N and not N-1
    # Calculate the (unit) eigenvectors and eigenvalues of S.  Sort them in descending order of importance relative to the magnitude of the eigenvalues.  
    eig_vals,eig_vecs = eig(S)
    # Keep only the n largest Principle Components of the sorted unit eigenvectors.
    principal_vals,principal_components = select_components(eig_vals,num_components)
    # Compute the projection matrix using only the n largest Principle Components of the sorted unit eigenvectors,where n = num_components.  
    #P = projection_matrix(eig_vecs[:,:num_components])
    P = projection_matrix(principal_components)
    # Reconstruct the data by using the projection matrix to project the data onto the principal component vectors we've kept
    X_reconst = (P @ X.T).T 

    return X_reconst,mean_vec,principal_vals,principal_components

这是我应该通过的测试用例:

random = np.random.RandomState(0)
X = random.randn(10,5)

from sklearn.decomposition import PCA as SKPCA

for num_component in range(1,4):
    # We can compute a standard solution given by scikit-learn's implementation of PCA
    pca = SKPCA(n_components=num_component,svd_solver="full")
    sklearn_reconst = pca.inverse_transform(pca.fit_transform(X))
    reconst,_,_ = PCA(X,num_component)
    # The difference in the result should be very small (<10^-20)
    print(
        "difference in reconstruction for num_components = {}: {}".format(
            num_component,np.square(reconst - sklearn_reconst).sum()
        )
    )
    np.testing.assert_allclose(reconst,sklearn_reconst)

解决方法

据我所知,您的代码存在一些问题。

你的投影矩阵有误。

如果协方差矩阵的特征向量是 B,维度为 D x M,其中 M 是您选择的分量数,D 是原始数据的维度,那么投影矩阵就是 B @ B.T。>

在 PCA 的标准实现中,我们通常不会通过标准偏差的倒数来缩放数据。您似乎正在尝试对白化 PCA (ZCA) 进行近似,但即便如此,它看起来还是错误的。

作为一个快速测试,您可以不除以标准差来计算归一化数据,并且在计算协方差矩阵时,设置bias=False

您还应该在将数据乘以投影运算符之前从数据中减去平均值,然后再将其添加回来,即, X_reconst = (P @ (X - mean_vec).T).T + mean_vec

PCA 本质上只是改变基,然后丢弃对应于低方差方向的坐标。协方差矩阵的特征向量对应新的正交基,特征值告诉你数据沿对应特征向量方向的方差。 P = B @ B.T 只是基跟到新基的变化(并丢弃一些坐标),B,然后变回原来的基。

编辑

我很想知道哪个在线课程教人们以这种方式实施 PCA。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res