我需要帮助,如何使用具有张量流作为后端的keras来提高训练的准确性。
首先我从这里下载了公共电气数据集 - > http://plaidplug.com/
然后我为每个类别选择了5个数据集,选择2000行Current(I)每个数据集,将它们堆叠起来并保存为input.h5
该文件将具有这样的结构。我将此堆叠矩阵命名为currentdata
[[~2000 of data for AC],
[~2000 of data for AC],
...,
...,
...,
[~2000 of data for CFL],
[~2000 of data for CFL],
...,
...,
...,
[~2000 of data for Fridge],
...,
...,
...,
...,
...,
[~2000 of data for Heater]]
然后我为输出创建了一个txt
文件。该文件由以下字符串组成,表示每个类别的数据集(5)的数量:
AC,AC,AC,AC,AC,CFL,CFL,CFL,CFL,CFL,Fridge,Fridge,...,Heater,Heater,Heater,Heater,Heater
以下是我的代码
import numpy as np
import h5py
import pandas
from keras import optimizers
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers.normalization import BatchNormalization
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint
from sklearn.model_selection import cross_val_score, KFold
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.pipeline import Pipeline
seed = 7
np.random.seed(seed)
### load dataset with 2000 row ###
input_data = h5py.File('input.h5', 'r')
output_type = open('output.txt', 'r')
X = input_data['currentdata'][:] ## input the stacked matrix
Y = output_type.read().split(',') ## read output
## encode class value as integer
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
## convert integers to dummy variables (one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
weight_path = "/home/fang/workspace/myproject/weight/"
def baseline_model():
# create model
model = Sequential()
model.add(Dense(800, input_dim=2000, init='normal', activation='relu'))
model.add(Dense(400, init='normal', activation='relu'))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dense(11, activation='softmax')) ## 11 output category based from PLAID dataset
## Compile model
#opt = optimizers.SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
## save the architecture
model_json = model.to_json()
with open('current_data_11type.json', 'w') as json_file:
json_file.write(model_json)
## save the weight
model.save_weights(weight_path + 'current_data_4type.h5')
return model
#estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=500,batch_size=64, verbose=0)
estimators = []
estimators.append(('standardized', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=baseline_model, nb_epoch=400, batch_size=32, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
#results = cross_val_score(estimator, X, dummy_y, cv=kfold)
results = cross_val_score(pipeline, X, dummy_y, cv=kfold)
print ("Accuracy: %.2f%% (%.2f%%)" %(results.mean()*100, results.std()*100))
我尝试使用和不使用StandardScaler()
,但这也没有给出任何好结果。我得到的最高准确率是57%。
我还尝试使用学习率为SGD
的{{1}}优化工具,而不是使用0.01~0.06
,但这也没有给我带来好结果。
添加图层,将adam
从低至5更改为10并使用32,64,128对我没有任何帮助。
我的系统:
batch_size
我也试图增加我的数据大小。(> 5000行)但这会给我OS:Ubuntu 16.04 LTS
processor:Intel® Core™ i3-5005U CPU @ 2.00GHz × 4
RAM:4GB
GPU:GeForce 920MX 2GB
有没有人对如何解决这类问题有任何想法或建议?
感谢您的帮助。