我了解到:
将Stochastig梯度下降(仅适用于一个示例)与微型批处理梯度下降进行比较时。迷你批处理速度更快,因为cpu可以在计算过程中使用矢量化。但是我无法显示。 吴安德(Andrew Ng)于4:19分钟在他的course(link)中说道。
那么,为什么我们要一次查看b个示例,而不是在随机梯度下降时一次仅查看一个示例?答案是矢量化。特别是,只有在具有良好的矢量化实现的情况下,小批量梯度下降才可能胜过随机梯度下降。
有人可以给我看一些演示这个的代码吗?
我自己尝试过,但无法显示:
我有1000个示例
随机梯度下降贯穿所有示例==> 2000次更新2次
每个小批次都有100个示例,因此所有示例都需要10个小批量,并且我要循环200次==> 2000次更新
import time
import math, numpy as np
from numpy.random import random
from matplotlib import pyplot as plt
#(Mini)Batch Gradient Descent
def upd(x,y):
global a_guess, b_guess
y_pred = a_guess*x+b_guess
a_guess = a_guess - lr * len(x)**-1*((y_pred-y) * x).sum()
b_guess = b_guess - lr * len(x)**-1*((y_pred-y)).sum()
#This function takes some inputs(e.g. x) and targets (e.g. y) and a desired batch size and slizes data
def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
assert inputs.shape[0] == targets.shape[0]
if shuffle:
indices = np.arange(inputs.shape[0])
np.random.shuffle(indices)
for start_idx in range(0, inputs.shape[0], batchsize):
end_idx = min(start_idx + batchsize, inputs.shape[0])
if shuffle:
excerpt = indices[start_idx:end_idx]
else:
excerpt = slice(start_idx, end_idx)
yield inputs[excerpt], targets[excerpt]
#Generate some points as X and Y
a=3
b=2
n=int(1e3)
x = random(n)
y = a*x+b
plt.scatter(x,y)
#Stochastic Gradient Descent with Batchsize=1
a_guess=1
b_guess=1
lr=0.01
batchsize=1
start_time = time.time()
for n in range(2):
for batch in iterate_minibatches(x, y, batchsize, shuffle=True):
x_batch, y_batch = batch
upd(x_batch,y_batch)
elapsed_time = time.time() - start_time
print('CPU time on one Example = ',elapsed_time)
y_pred=a_guess*x+b_guess
loss=((y-y_pred)**2).sum()
print( "loss: %.10s, a_guess= %.10s, b_guess= %.10s, a=%.10s, b=%.10s" % (loss,a_guess,b_guess,a,b))
#Setting length of Mini-Batches to 100
a_guess=1
b_guess=1
lr=0.01
batchsize=100
start_time = time.time()
for n in range(200):
for batch in iterate_minibatches(x, y, batchsize, shuffle=True):
x_batch, y_batch = batch
upd(x_batch,y_batch)
elapsed_time = time.time() - start_time
print('CPU time on batches with 100 examples each= ',elapsed_time)
y_pred=a_guess*x+b_guess
loss=((y-y_pred)**2).sum()
print( "loss: %.10s, a_guess= %.10s, b_guess= %.10s, a=%.10s, b=%.10s" % (loss,a_guess,b_guess,a,b))
但是迷你批处理花费的时间更长。不应该这样吗?
CPU time on one Example = 0.03797745704650879
CPU time on batches with 100 examples each= 0.05696702003479004