在Python

时间:2017-05-24 08:57:37

标签: python machine-learning

我目前正在尝试使用load_boston库获取的名为scikit-learn的玩具数据集执行批量渐变下降。数据集的维数 506 x 13 ,由100个数量级的数据组成。下面是我的Python脚本,然后是运行脚本时的错误。

boston_data_regression.py

import scipy
import numpy

from sklearn.datasets import load_boston

def generateGradient (X, Y, m, alpha, theta, num_iterations) :

    X_transpose = X.transpose()

    for i in range(0, num_iterations) :
        hypothesis = numpy.dot(X, theta)
        delta = hypothesis - Y
        cost = numpy.sum(delta ** 2) / (2 * m)

        print ("No. iteration : %d | Cost : %ld" % ((i + 1), cost))

        gradient = numpy.dot(X_transpose, delta) / m
        theta = theta - alpha * gradient 

    return (theta)

if __name__ == '__main__' :

    boston_data = load_boston()
    X = boston_data.data[:, 0:11]
    Y = boston_data.data[:,12]

    print (boston_data.data)

    print (numpy.shape(X))
    print (numpy.shape(Y))

    num_iterations = 100000
    alpha = 0.0005
    m, n = numpy.shape(X)

    theta = numpy.ones(n)
    theta = generateGradient(X, Y, m, alpha, theta, num_iterations)

    print (theta)

错误:

No. iteration : 75 | Cost : 5107568749643583921695342267251134617186569132604666005559083886757991071451800270203896531093730395389956630990780914914913406418422174358389131741568461360913005557192743665544540413282512755425657295941969706284629047517505070375172805106443882740219842668724638239205198801815953626988648840822784
No. iteration : 76 | Cost : 50304231336916560424319335120140228744355885776376593114754676052001428477104842266241766923801372402675185672996149747402542290566577918714034301765248577735574592772115140169849029676464020678156657455729204985429508262045621361912203426365153327346440580108502094724090338985744326599309593512431845376
boston_data_regression.py:13: RuntimeWarning: overflow encountered in square
  cost = numpy.sum(delta ** 2) / (2 * m)
Traceback (most recent call last):
  File "boston_data_regression.py", line 38, in <module>
    theta = generateGradient(X, Y, m, alpha, theta, num_iterations)
  File "boston_data_regression.py", line 15, in generateGradient
    print ("No. iteration : %d | Cost : %ld" % ((i + 1), cost))
TypeError: %d format: a number is required, not numpy.float64

我是否知道如何对此错误进行排序以及是否有更好/更优化的方式来执行批量梯度下降?

1 个答案:

答案 0 :(得分:1)

您的问题源于您的价值观。您的值稳定增加到5.e+304,并且在随后的时间步骤中会出现错误,这可能来自溢出。

您可以使用

检查numpy.float64值的限制
import numpy
numpy.finfo('d')
finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)

如您所见,最大值约为1.8e+308。这个问题的解决方案是缩小值。