大家好!
我正在尝试使用Keras和TensorFlow开发神经网络,它应该能够将可变长度数组作为输入并给出一些单一值(参见下面的玩具示例)或对它们进行分类(这是一个问题,以后和这个问题不会被触及。)
这个想法很简单。
我们有可变长度数组。我目前正在使用非常简单的玩具数据,该数据由以下代码生成:
import numpy as np
import pandas as pd
from keras import models as kem
from keras import activations as kea
from keras import layers as kel
from keras import regularizers as ker
from keras import optimizers as keo
from keras import losses as kelo
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize
n = 100
x = pd.DataFrame(columns=['data','res'])
mms = MinMaxScaler(feature_range=(-1,1))
for i in range(n):
k = np.random.randint(20,100)
ss = np.random.randint(0,100,size=k)
idres = np.sum(ss[np.arange(0,k,2)])-np.sum(ss[np.arange(1,k,2)])
x.loc[i,'data'] = ss
x.loc[i,'res'] = idres
x.res = mms.fit_transform(x.res)
x_train,x_test,y_train, y_test = train_test_split(x.data,x.res,test_size=0.2)
x_train = sliding_window(x_train.as_matrix(),2,2)
x_test = sliding_window(x_test.as_matrix(),2,2)
简单来说,我生成随机长度的数组,每个数组的结果(输出)是偶数元素的总和 - 奇数元素的总和。显然,它可能是消极和积极的。然后输出缩放到范围[-1,1]以适合tanh
激活函数。
顺序模型生成如下:
model = kem.Sequential()
model.add(kel.LSTM(20,return_sequences=False,input_shape=(None,2),recurrent_activation='tanh'))
model.add(kel.Dense(20,activation='tanh'))
model.add(kel.Dense(10,activation='tanh'))
model.add(kel.Dense(5,activation='tanh'))
model.add(kel.Dense(1,activation='tanh'))
sgd = keo.SGD(lr=0.1)
mseloss = kelo.mean_squared_error
model.compile(optimizer=sgd,loss=mseloss,metrics=['accuracy'])
模型的培训正在以下列方式进行:
def calcMSE(model,x_test,y_test):
nTest = len(x_test)
sum = 0
for i in range(nTest):
restest = model.predict(np.reshape(x_test[i],(1,-1,2)))
sum+=(restest-y_test[0,i])**2
return sum/nTest
i = 1
mse = calcMSE(model,x_test,np.reshape(y_test.values,(1,-1)))
lrPar = 0
lrSteps = 30
while mse>0.04:
print("Epoch %i" % (i))
print(mse)
for j in range(len(x_train)):
ntrain=j
model.train_on_batch(np.reshape(x_train[ntrain],(1,-1,2)),np.reshape(y_train.values[ntrain],(-1,1)))
i+=1
mse = calcMSE(model,x_test,np.reshape(y_test.values,(1,-1)))
问题是优化器通常在MSE = 0.05(在测试集上)时卡住。上次我测试时,它实际上停留在MSE = 0.12(测试数据上)。
此外,如果你看一下模型给出的测试数据(左栏)与正确的输出(右栏)的比较:
[[-0.11888303]] 0.574923547401
[[-0.17038491]] -0.452599388379
[[-0.20098214]] 0.065749235474
[[-0.22307695]] -0.437308868502
[[-0.2218809]] 0.371559633028
[[-0.2218741]] 0.039755351682
[[-0.22247596]] -0.434250764526
[[-0.17094387]] -0.151376146789
[[-0.17089397]] -0.175840978593
[[-0.16988073]] 0.025993883792
[[-0.16984619]] -0.117737003058
[[-0.17087571]] -0.515290519878
[[-0.21933308]] -0.366972477064
[[-0.09379648]] -0.178899082569
[[-0.17016701]] -0.333333333333
[[-0.17022927]] -0.195718654434
[[-0.11681376]] 0.452599388379
[[-0.21438009]] 0.224770642202
[[-0.12475857]] 0.151376146789
[[-0.2225963]] -0.380733944954
在训练集上也是如此:
[[-0.22209576]] -0.00764525993884
[[-0.17096499]] -0.247706422018
[[-0.22228305]] 0.276758409786
[[-0.16986915]] 0.340978593272
[[-0.16994311]] -0.233944954128
[[-0.22131597]] -0.345565749235
[[-0.17088912]] -0.145259938838
[[-0.22250554]] -0.792048929664
[[-0.17097935]] 0.119266055046
[[-0.17087702]] -0.2874617737
[[-0.1167363]] -0.0045871559633
[[-0.08695849]] 0.159021406728
[[-0.17082921]] 0.374617737003
[[-0.15422876]] -0.110091743119
[[-0.22185338]] -0.7125382263
[[-0.17069265]] -0.678899082569
[[-0.16963181]] -0.00611620795107
[[-0.17089556]] -0.249235474006
[[-0.17073657]] -0.414373088685
[[-0.17089497]] -0.351681957187
[[-0.17138508]] -0.0917431192661
[[-0.22351067]] 0.11620795107
[[-0.17079701]] -0.0795107033639
[[-0.22246087]] 0.22629969419
[[-0.17044055]] 1.0
[[-0.17090379]] -0.0902140672783
[[-0.23420531]] -0.0366972477064
[[-0.2155242]] 0.0366972477064
[[-0.22192241]] -0.675840978593
[[-0.22220723]] -0.354740061162
[[-0.1671907]] -0.10244648318
[[-0.22705412]] 0.0443425076453
[[-0.22943887]] -0.249235474006
[[-0.21681401]] 0.065749235474
[[-0.12495813]] 0.466360856269
[[-0.17085686]] 0.316513761468
[[-0.17092516]] 0.0275229357798
[[-0.17277785]] -0.325688073394
[[-0.22193027]] 0.139143730887
[[-0.17088208]] 0.422018348624
[[-0.17093034]] -0.0886850152905
[[-0.17091317]] -0.464831804281
[[-0.22241674]] -0.707951070336
[[-0.1735626]] -0.337920489297
[[-0.16984227]] 0.00764525993884
[[-0.16756304]] 0.515290519878
[[-0.22193302]] -0.414373088685
[[-0.22419722]] -0.351681957187
[[-0.11561158]] 0.17125382263
[[-0.16640976]] -0.321100917431
[[-0.21557514]] -0.313455657492
[[-0.22241823]] -0.117737003058
[[-0.22165506]] -0.646788990826
[[-0.22238114]] -0.261467889908
[[-0.1709189]] 0.0902140672783
[[-0.17698884]] -0.626911314985
[[-0.16984172]] 0.587155963303
[[-0.22226149]] -0.590214067278
[[-0.16950315]] -0.469418960245
[[-0.22180589]] -0.133027522936
[[-0.2224243]] -1.0
[[-0.22236891]] 0.152905198777
[[-0.17089345]] 0.435779816514
[[-0.17422611]] -0.233944954128
[[-0.17177556]] -0.324159021407
[[-0.21572633]] -0.347094801223
[[-0.21509495]] -0.646788990826
[[-0.17086846]] -0.34250764526
[[-0.17595944]] -0.496941896024
[[-0.16803505]] -0.382262996942
[[-0.16983894]] -0.348623853211
[[-0.17078683]] 0.363914373089
[[-0.21560851]] -0.186544342508
[[-0.22416025]] -0.374617737003
[[-0.1723443]] -0.186544342508
[[-0.16319042]] -0.0122324159021
[[-0.18837349]] -0.181957186544
[[-0.17371364]] -0.539755351682
[[-0.22232121]] -0.529051987768
[[-0.22187822]] -0.149847094801
正如你所看到的,模型输出实际上彼此非常接近,不同于训练集,其中变异性要大得多(尽管我应该承认,负值在训练和测试集中都是主导者。
我在这里做错了什么?为什么训练被卡住或者是正常的过程,我应该把它留下更长的时间(我几次做了几次hochreds epochs并且仍然保持卡住状态)。我也尝试使用可变学习率(例如,使用余弦退火和重启(如I. Loshchilov和F. Hutter.Sgdr:Stochastic梯度下降,重启)。 arXiv preprint arXiv:1608.03983,2016。)
我很感激网络结构和培训方法以及编码/详细方面的任何建议。
非常感谢您提前寻求帮助。