Question

我正在尝试学习音频文件的分类器。我读取了我的WAV文件，并将它们转换为一系列频谱图像，以便在自定义Python函数中进行训练。该函数使用tf.py_func调用，并返回具有相同形状的图像数组。换句话说，图像形状被很好地定义，但图像的数量是动态的。（例如，3个谱图用于短音频片段，15个用于长片段）

有没有办法解压结果列表，以便在tf.train.batch_join()中进一步处理/排队？未定义的序列长度似乎是许多TF操作的问题。可以以某种方式推断长度吗？

...
// Read the audio file name and label from a CSV file
audio_file, label = tf.decode_csv(csv_content)

def read_audio(audio_file):

    signal = read_wav(audio_file)
    images = [generate_image(segment) for segment in split_audio(signal)]

    // This output is of varying length depending on the length of audio file.
    return images

// Convert audio file to a variable length sequence of images
// Shape: <unknown>, which is to be expected from tf.py_func
image_sequence = tf.py_func(wav_to_spectrogram, [audio_file], [tf.float32])[0]

// Auxilliary to set a shape for the images defined in tf.py_func
def process_image(in_image):
    image = tf.image.convert_image_dtype(in_image, dtype=tf.float32)
    image.set_shape([600, 39, 1])

    return (image, label)


// Shape: (?, 600, 39, 1)
images_labels = tf.map_fn(process_image, image_sequence, dtype=(tf.float32, tf.int32))


// This will not work. 'images_and_labels' needs to be a list
images, label_index_batch = tf.train.batch_join(
    images_and_labels,
    batch_size=batch_size,
    capacity=2 * num_preprocess_threads * batch_size,
    shapes=[data_shape, []],
)

Answer 1

您可以使用可变大小的Tensor作为输入，并使用enqueue_many将此张量视为可变大小的输入批处理。

下面是py_func生成可变大小批次和批处理的示例，enqueue_many将其转换为常量批量。

import tensorflow as tf

tf.reset_default_graph()

# start with time-out to prevent hangs when experimenting
config = tf.ConfigProto()
config.operation_timeout_in_ms=2000
sess = tf.InteractiveSession(config=config)

# initialize first queue with 1, 2, 1, 2
queue1 = tf.FIFOQueue(capacity=4, dtypes=[tf.int32])
queue1_input = tf.placeholder(tf.int32)
queue1_enqueue = queue1.enqueue(queue1_input)
sess.run(queue1_enqueue, feed_dict={queue1_input: 1})
sess.run(queue1_enqueue, feed_dict={queue1_input: 2})
sess.run(queue1_enqueue, feed_dict={queue1_input: 1})
sess.run(queue1_enqueue, feed_dict={queue1_input: 2})
sess.run(queue1.close())

# call_func will produce variable size tensors
def range_func(x):
  return np.array(range(x), dtype=np.int32)
[call_func] = tf.py_func(range_func, [queue1.dequeue()], [tf.int32])
queue2_dequeue = tf.train.batch([call_func], batch_size=3, shapes=[[]], enqueue_many=True)

coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
try:
  while True:
    print sess.run(queue2_dequeue)
except tf.errors.OutOfRangeError:
  pass
finally:
  coord.request_stop()
coord.join(threads)
sess.close()

你应该看到

[0 0 1]
[0 0 1]

使用Tensorflow操作

1 个答案: