我正在制作一个输入形状为(56088,22050,1)的语音识别模型,该模型总体上可以从.npy文件(大小约为5GB)加载到内存中,但我想弄清楚更好的方法。我遇到了keras fit_generator()方法,但是大多数示例都基于mnist并使用了ImageDataGenerator()函数。我意识到我必须制作一个自定义的生成器函数,但是我不确定如何。根据{{3}}线程,我引用了他的生成器函数来进行类似的操作,但是我仍然必须将整个数据加载到内存中,这需要花费很多时间。另外,我不确定该程序是否可以运行,因为在我运行该程序的前20分钟内它根本不输出任何内容
还有其他出路吗?
import librosa
import glob
import tensorflow as tf
import os
import numpy as np
class_list, X_train, Y_train = [],[],[]
filename = "D:\\SpeechRecognitionData\\train\\audio\\"
class_names = os.listdir(filename)
print(class_names)
for classes in class_names:
if classes == '_background_noise_':
continue
else:
class_list.append(''.join(filename+classes))
print(class_list,"\n",len(class_list))
def create_X(address):
wave,sr = librosa.load(address)
wave.reshape(-1,1)
yield wave
def getLabel(filename):
base_name = os.path.basename(filename)
return base_name
def onehot(Y_train):
from sklearn import preprocessing
enc = preprocessing.OneHotEncoder()
Y_train = Y_train.reshape(-1,1)
enc.fit(Y_train)
Y_train = enc.transform(Y_train).toarray()
return Y_train
def execute(X_train, Y_train):
loop = 0
for i in class_list:
c=0
loop+=1
for file in glob.glob("".join(i+"\\*.wav")): # iterating through each .wav audio file in the directory to create training data
if np.array(list(create_X(file))).shape[0] == 22050:
c+=1
Y_train.append(class_names.index(getLabel(i)))
X_train.append(create_X(file))
if c%100==0:
print("{} files processed in loop {}".format(c,loop))
while 1:
for i in range(1558): # 36*1558 = 56088
if i%125==0:
print("i= "+str(i))
yield np.array(X_train[i*36:(i+1)*36]).reshape(X_train.shape[0],X_train.shape[1],1), onehot(np.array(Y_train[i*36:(i+1)*36]))
input_shape = (22050,1)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(16,activation='relu',input_shape=input_shape,kernel_size=(10)))
model.add(tf.keras.layers.MaxPool1D())
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Conv1D(32,activation='relu',kernel_size=(10)))
model.add(tf.keras.layers.MaxPool1D())
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Conv1D(16,activation='relu',kernel_size=(10)))
model.add(tf.keras.layers.MaxPool1D())
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128,activation='relu'))
model.add(tf.keras.layers.Dense(64,activation='relu'))
model.add(tf.keras.layers.Dense(30,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
generator = execute(X_train,Y_train)
model.fit_generator(generator,steps_per_epoch=56088//36,shuffle=True)
model.save("model.h5")
答案 0 :(得分:0)
因此,我通过在此处查看此示例来弄清楚了-https://github.com/tjh48/keras_generators/blob/master/keras_generator_example.ipynb
如果有人遇到这个问题,那么他们可以参考我的笔记本 https://github.com/DarshanDeshpande/Speech-Recognition/blob/master/SpeechRecognitionWithGenerators.ipynb
谢谢!