为什么批量较大时PCA需要更长的时间?

时间:2020-09-01 21:45:16

标签: python scikit-learn pca

from sklearn.datasets import fetch_openml
from sklearn.decomposition import MiniBatchSparsePCA
import time

mnist = fetch_openml('mnist_784')

start_time10 = time.time()
X = mnist.data
dr = MiniBatchSparsePCA(n_components=28, batch_size=10)
X_ = dr.fit_transform(X)
time_elapsed10 = time.time() - start_time10

start_time20 = time.time()
X = mnist.data
dr = MiniBatchSparsePCA(n_components=28, batch_size=50)
X_ = dr.fit_transform(X)
time_elapsed20 = time.time() - start_time20

print(str(time_elapsed10) + " and " + str(time_elapsed20))
Output: 
32.51124620437622 and 50.197274684906006

必须分析所有数据点,为什么较大的缓冲区需要更长的时间?每隔两次我使用一个缓冲区,缓冲区越大越好,为什么这里不成立?

0 个答案:

没有答案