迷你批次梯度下降与随机梯度描述如何展示矢量化的力量

时间:2019-05-18 19:52:49

标签: python gradient-descent mini-batch

我了解到:

将Stochastig梯度下降(仅适用于一个示例)与微型批处理梯度下降进行比较时。迷你批处理速度更快,因为cpu可以在计算过程中使用矢量化。但是我无法显示。 吴安德(Andrew Ng)于4:19分钟在他的course(link)中说道。

  

那么,为什么我们要一次查看b个示例,而不是在随机梯度下降时一次仅查看一个示例?答案是矢量化。特别是,只有在具有良好的矢量化实现的情况下,小批量梯度下降才可能胜过随机梯度下降。

有人可以给我看一些演示这个的代码吗?

我自己尝试过,但无法显示:

我有1000个示例

  1. 随机梯度下降贯穿所有示例==> 2000次更新2次

  2. 每个小批次都有100个示例,因此所有示例都需要10个小批量,并且我要循环200次==> 2000次更新

import time
import math, numpy as np
from numpy.random import random
from matplotlib import pyplot as plt

#(Mini)Batch Gradient Descent
def upd(x,y):
    global a_guess, b_guess
    y_pred = a_guess*x+b_guess
    a_guess = a_guess - lr * len(x)**-1*((y_pred-y) * x).sum()
    b_guess = b_guess - lr * len(x)**-1*((y_pred-y)).sum()

#This function takes some inputs(e.g. x) and targets (e.g. y) and a desired batch size and slizes data
def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0], batchsize):
        end_idx = min(start_idx + batchsize, inputs.shape[0])
        if shuffle:
            excerpt = indices[start_idx:end_idx]
        else:
            excerpt = slice(start_idx, end_idx)
        yield inputs[excerpt], targets[excerpt]

#Generate some points as X and Y
a=3
b=2
n=int(1e3)
x = random(n)
y = a*x+b
plt.scatter(x,y)

#Stochastic Gradient Descent with Batchsize=1
a_guess=1
b_guess=1
lr=0.01
batchsize=1
start_time = time.time()
for n in range(2):
    for batch in iterate_minibatches(x, y, batchsize, shuffle=True):
        x_batch, y_batch = batch
        upd(x_batch,y_batch)
elapsed_time = time.time() - start_time
print('CPU time on one Example = ',elapsed_time)

y_pred=a_guess*x+b_guess
loss=((y-y_pred)**2).sum()
print( "loss: %.10s, a_guess= %.10s, b_guess= %.10s, a=%.10s, b=%.10s" % (loss,a_guess,b_guess,a,b))

#Setting length of Mini-Batches to 100
a_guess=1
b_guess=1
lr=0.01
batchsize=100
start_time = time.time()
for n in range(200):
    for batch in iterate_minibatches(x, y, batchsize, shuffle=True):
        x_batch, y_batch = batch
        upd(x_batch,y_batch)
elapsed_time = time.time() - start_time
print('CPU time on batches with 100 examples each= ',elapsed_time)

y_pred=a_guess*x+b_guess
loss=((y-y_pred)**2).sum()
print( "loss: %.10s, a_guess= %.10s, b_guess= %.10s, a=%.10s, b=%.10s" % (loss,a_guess,b_guess,a,b))

但是迷你批处理花费的时间更长。不应该这样吗?

CPU time on one Example =  0.03797745704650879

CPU time on batches with 100 examples each=  0.05696702003479004

0 个答案:

没有答案