我正在尝试应用PCA算法来压缩MNIST手写数据集,以提高我的神经网络性能。所以我在python3中写了这个函数:
**Pom of server Factory. The project where is my class that creates me problems.**
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.server.connection</groupId>
<artifactId>Utilities</artifactId>
<version>0.0.1-SNAPSHOT</version>
</parent>
<artifactId>serverUtilities</artifactId>
<packaging>jar</packaging>
<name>Server Factory Connection</name>
**Pom of Server Connection that is the father od Server Factory**
<modelVersion>4.0.0</modelVersion>
<groupId>com.server.connection</groupId>
<artifactId>Utilities</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>pom</packaging>
<name>Utilities</name>
<modules>
<module>serverUtilities</module>
</modules>
**Pom where server factory is used.**
<modelVersion>4.0.0</modelVersion>
<groupId>com.server.getFactory</groupId>
<artifactId>ManageConnections</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>war</packaging>
<name>Manage Connection Component</name>
<dependencies>
<dependency>
<groupId>com.server.connection</groupId>
<artifactId>serverUtilities</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
</dependencies>
由于原始数据集非常大,因此出现以下错误:
def pca(X, K):
m, n = X.shape
sigma = (1/m) * X.T @ X
U, S, V = numpy.linalg.svd(sigma)
U = U[:, :K]
Z = X @ U
return (Z, U)
我尝试了功能缩放来解决此问题,但是没有用。
edit:在我的测试运行中,m和n值分别为70000和784。