Question

我正在尝试应用PCA算法来压缩MNIST手写数据集，以提高我的神经网络性能。所以我在python3中写了这个函数：

    **Pom of server Factory. The project where is my class that creates me problems.**
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>com.server.connection</groupId>
        <artifactId>Utilities</artifactId>
        <version>0.0.1-SNAPSHOT</version>
    </parent>
    <artifactId>serverUtilities</artifactId>
    <packaging>jar</packaging>
    <name>Server Factory Connection</name>


  **Pom of Server Connection that is the father od Server Factory**
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.server.connection</groupId>
    <artifactId>Utilities</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>pom</packaging>
    <name>Utilities</name>
    <modules>
        <module>serverUtilities</module>
    </modules>


    **Pom where server factory is used.**   
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.server.getFactory</groupId>
    <artifactId>ManageConnections</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>war</packaging>
    <name>Manage Connection Component</name>
    <dependencies>
        <dependency>
            <groupId>com.server.connection</groupId>
            <artifactId>serverUtilities</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
    </dependencies>

由于原始数据集非常大，因此出现以下错误：

def pca(X, K):
    m, n = X.shape
    sigma = (1/m) * X.T @ X
    U, S, V = numpy.linalg.svd(sigma)
    U = U[:, :K]
    Z = X @ U
    return (Z, U)

我尝试了功能缩放来解决此问题，但是没有用。

edit：在我的测试运行中，m和n值分别为70000和784。

MNIST数据集的PCA应用：内存错误

0 个答案: