所以我有一个序列到序列的问题,其中输入是许多具有不同长度的多元序列,而输出是一个二进制向量序列,其长度与输入对应序列的长度相同。我将长度相同的序列分组到一个单独的文件夹中,并称其为fit函数,如下所示:
for e in range(epochs):
print('Epoch', e+1)
for i in range(3,19):
train_x_batch,train_y_batch,batch_size= get_data(i)
history=model.fit_(train_x_batch,train_y_batch,
batch_size=batch_size,
validation_split=0.15,
callbacks=[tensorboard_cb])
def get_data(i):
train_x = np.load(os.path.join(cwd, "lab_values","batches",f"f_{i}","train_x.npy"), allow_pickle=True)
train_y = np.load(os.path.join(cwd, "lab_values","batches",f"f_{i}","train_y.npy"), allow_pickle=True)
print(f"batch no {i} Train X size= ", train_x.shape)
print(f"batch no {i} Train Y size= ", train_y.shape)
batch_Size=train_x.shape[0]
return train_x,train_y,batch_size
那么问题是有更好的方法吗?我听说我可以为此使用生成器,因为不幸的是我无法实现这种生成器。
答案 0 :(得分:0)
您正在尝试训练整个数据(npy file)
,而不是分批训练模型。
我们可以编写 Generator
并在 Batches
中训练模型。
我们使用代码
从现有的Numpy文件中提取一批数据 train_x = np.load(os.path.join(cwd, "lab_values","batches",f"f_{i}","train_x.npy"), mmap_mode='r', allow_pickle=True)
和
x_batch = train_x[start:end].copy()
。
Generator
的完整代码和Training
的代码如下所示:
import numpy as np
for e in range(epochs):
print('Epoch', e+1)
for i in range(3,19):
#train_x_batch,train_y_batch = get_data(i)
batch_size = 32
history=model.fit_(get_data(i),
batch_size=batch_size,
validation_split=0.15,
callbacks=[tensorboard_cb],epochs = 20
steps_per_epoch = 500, val_steps = 10)
def get_data(i):
train_x = np.load(os.path.join(cwd, "lab_values","batches",f"f_{i}","train_x.npy"),
mmap_mode='r', allow_pickle=True)
train_y = np.load(os.path.join(cwd, "lab_values","batches",f"f_{i}","train_y.npy"),
mmap_mode='r', allow_pickle=True)
print(f"batch no {i} Train X size= ", train_x.shape)
print(f"batch no {i} Train Y size= ", train_y.shape)
Number_Of_Rows = train_x.shape[0]
batch_size = 32
start = np.random.choice(Number_of_Rows - batch_size)
end = start + batch_size
x_batch = train_x[start:end].copy()
y_batch = train_y[start:end].copy()
yield x_batch,y_batch
有关更多信息,请同时参阅此SO Question和此SO Question。