我正在尝试学习音频文件的分类器。我读取了我的WAV文件,并将它们转换为一系列频谱图像,以便在自定义Python函数中进行训练。该函数使用tf.py_func
调用,并返回具有相同形状的图像数组。换句话说,图像形状被很好地定义,但图像的数量是动态的。 (例如,3个谱图用于短音频片段,15个用于长片段)
有没有办法解压结果列表,以便在tf.train.batch_join()
中进一步处理/排队?未定义的序列长度似乎是许多TF操作的问题。可以以某种方式推断长度吗?
...
// Read the audio file name and label from a CSV file
audio_file, label = tf.decode_csv(csv_content)
def read_audio(audio_file):
signal = read_wav(audio_file)
images = [generate_image(segment) for segment in split_audio(signal)]
// This output is of varying length depending on the length of audio file.
return images
// Convert audio file to a variable length sequence of images
// Shape: <unknown>, which is to be expected from tf.py_func
image_sequence = tf.py_func(wav_to_spectrogram, [audio_file], [tf.float32])[0]
// Auxilliary to set a shape for the images defined in tf.py_func
def process_image(in_image):
image = tf.image.convert_image_dtype(in_image, dtype=tf.float32)
image.set_shape([600, 39, 1])
return (image, label)
// Shape: (?, 600, 39, 1)
images_labels = tf.map_fn(process_image, image_sequence, dtype=(tf.float32, tf.int32))
// This will not work. 'images_and_labels' needs to be a list
images, label_index_batch = tf.train.batch_join(
images_and_labels,
batch_size=batch_size,
capacity=2 * num_preprocess_threads * batch_size,
shapes=[data_shape, []],
)
答案 0 :(得分:7)
您可以使用可变大小的Tensor作为输入,并使用enqueue_many
将此张量视为可变大小的输入批处理。
下面是py_func
生成可变大小批次和批处理的示例,enqueue_many
将其转换为常量批量。
import tensorflow as tf
tf.reset_default_graph()
# start with time-out to prevent hangs when experimenting
config = tf.ConfigProto()
config.operation_timeout_in_ms=2000
sess = tf.InteractiveSession(config=config)
# initialize first queue with 1, 2, 1, 2
queue1 = tf.FIFOQueue(capacity=4, dtypes=[tf.int32])
queue1_input = tf.placeholder(tf.int32)
queue1_enqueue = queue1.enqueue(queue1_input)
sess.run(queue1_enqueue, feed_dict={queue1_input: 1})
sess.run(queue1_enqueue, feed_dict={queue1_input: 2})
sess.run(queue1_enqueue, feed_dict={queue1_input: 1})
sess.run(queue1_enqueue, feed_dict={queue1_input: 2})
sess.run(queue1.close())
# call_func will produce variable size tensors
def range_func(x):
return np.array(range(x), dtype=np.int32)
[call_func] = tf.py_func(range_func, [queue1.dequeue()], [tf.int32])
queue2_dequeue = tf.train.batch([call_func], batch_size=3, shapes=[[]], enqueue_many=True)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
try:
while True:
print sess.run(queue2_dequeue)
except tf.errors.OutOfRangeError:
pass
finally:
coord.request_stop()
coord.join(threads)
sess.close()
你应该看到
[0 0 1]
[0 0 1]