我正在尝试使用Theano实现一个简单的xnor神经网络函数,我得到的类型不匹配
ValueError:args中的维度不匹配(8,1)x(2,1) - >(8,1)
尽管输入是尺寸(4X2)而输出是(4X1),但我不知道为什么它将输入尺寸读为(8X1)。
它应该是(4,2)X(2,1) - >(4,1)但是有些如何将其视为(8,1)x(2,1) - >(8,1) )
知道为什么,它将输入维度(n,m)读作(n * m,1)?
XNOR实施的简单神经网络:
print 'Importing Theano Library ...'
import theano
print 'Importing General Libraries ...'
import numpy as np
import theano.tensor as T
from theano import function
from theano import shared
from theano.ifelse import ifelse
import os
from random import random
import time
print(theano.config.device)
print 'Building Neural Network ...'
startTime = time.clock()
rng = np.random
#Define variables:
x = T.matrix('x')
w1 = shared(np.array([rng.random(1).astype(theano.config.floatX), rng.random(1).astype(theano.config.floatX)]))
w2 = shared(np.array([rng.random(1).astype(theano.config.floatX), rng.random(1).astype(theano.config.floatX)]))
w3 = shared(np.array([rng.random(1).astype(theano.config.floatX), rng.random(1).astype(theano.config.floatX)]))
b1 = shared(np.asarray(1., dtype=theano.config.floatX))
b2 = shared(np.asarray(1., dtype=theano.config.floatX))
learning_rate = 0.01
a1 = 1/(1+T.exp(-T.dot(x,w1)-b1))
a2 = 1/(1+T.exp(-T.dot(x,w2)-b1))
x2 = T.stack([a1,a2],axis=1)
a3 = 1/(1+T.exp(-T.dot(x2,w3)-b2))
a_hat = T.vector('a_hat') #Actual output
cost = -(a_hat*T.log(a3) + (1-a_hat)*T.log(1-a3)).sum()
dw1,dw2,dw3,db1,db2 = T.grad(cost,[w1,w2,w3,b1,b2])
train = function(inputs = [x,a_hat], outputs = [a3,cost], updates = [[w1, w1-learning_rate*dw1],[w2, w2-learning_rate*dw2],[w3, w3-learning_rate*dw3],[b1, b1-learning_rate*b1],[b2, b2-learning_rate*b2]])
print 'Neural Network Built'
TimeDelta = time.clock() - startTime
print 'Building Time: %.2f seconds' %TimeDelta
inputs = np.array([[0,0],[0,1],[1,0],[1,1]]).astype(theano.config.floatX)
outputs = np.array([1,0,0,1]).astype(theano.config.floatX)
#Iterate through all inputs and find outputs:
print 'Training the network ...'
startTime = time.clock()
cost = []
print 'input shape', inputs.shape
print 'output shape', outputs.shape
for iteration in range(60000):
print 'Iteration no. %d \r' %iteration,
pred, cost_iter = train(inputs, outputs)
cost.append(cost_iter)
TimeDelta = time.clock() - startTime
print 'Training Time: %.2f seconds' %TimeDelta
#Print the outputs:
print 'The outputs of the NN are: '
for i in range(len(inputs)):
print 'The output for x1=%d | x2=%d is %.2f' % (inputs[i][0], inputs[i][1], pred[i])
predict = function([x],a3)
print predict([[0,0]])
print predict([[0,1]])
print predict([[1,0]])
print predict([[1,1]])
终端输出:
Importing Theano Library ...
Using gpu device 0: NVIDIA Tegra X1 (CNMeM is enabled with initial size: 75.0% of memory, cuDNN 5005)
Importing General Libraries ...
gpu
Building Neural Network ...
Neural Network Built
Building Time: 1.78 seconds
Training the network ...
input shape (4, 2)
output shape (4,)
Traceback (most recent call last):
File "neuron2.py", line 59, in <module>
pred, cost_iter = train(inputs, outputs)
File "/home/ubuntu/Theano/theano/compile/function_module.py", line 879, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/ubuntu/Theano/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/ubuntu/Theano/theano/compile/function_module.py", line 866, in __call__
self.fn() if output_subset is None else\
ValueError: dimension mismatch in args to gemm (8,1)x(2,1)->(8,1)
Apply node that caused the error: GpuDot22(GpuReshape{2}.0, GpuReshape{2}.0)
Toposort index: 68
Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Inputs shapes: [(8, 1), (2, 1)]
Inputs strides: [(1, 0), (1, 0)]
Inputs values: ['not shown', CudaNdarray([[ 0.14762458]
[ 0.12991147]])]
Outputs clients: [[GpuReshape{3}(GpuDot22.0, Join.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
答案 0 :(得分:0)
共享变量w1,w2,w3在转换时作为矩阵创建,它们应该是向量,应该按以下方式进行转换:
这些行:
w1 = shared(np.array([rng.random(1).astype(theano.config.floatX), rng.random(1).astype(theano.config.floatX)]))
w2 = shared(np.array([rng.random(1).astype(theano.config.floatX), rng.random(1).astype(theano.config.floatX)]))
w3 = shared(np.array([rng.random(1).astype(theano.config.floatX), rng.random(1).astype(theano.config.floatX)]))
应该是:
from random import random
w1 = shared(np.asarray([random(), random()], dtype=theano.config.floatX))
w2 = shared(np.asarray([random(), random()], dtype=theano.config.floatX))
w3 = shared(np.asarray([random(), random()], dtype=theano.config.floatX))