from sklearn.datasets import fetch_openml
from sklearn.decomposition import MiniBatchSparsePCA
import time
mnist = fetch_openml('mnist_784')
start_time10 = time.time()
X = mnist.data
dr = MiniBatchSparsePCA(n_components=28, batch_size=10)
X_ = dr.fit_transform(X)
time_elapsed10 = time.time() - start_time10
start_time20 = time.time()
X = mnist.data
dr = MiniBatchSparsePCA(n_components=28, batch_size=50)
X_ = dr.fit_transform(X)
time_elapsed20 = time.time() - start_time20
print(str(time_elapsed10) + " and " + str(time_elapsed20))
Output:
32.51124620437622 and 50.197274684906006
必须分析所有数据点,为什么较大的缓冲区需要更长的时间?每隔两次我使用一个缓冲区,缓冲区越大越好,为什么这里不成立?