Question

有没有办法直接将音频文件（wav）加载到tensorflow中的张量？然后，再次将张量转换为音频文件？我看到有些人将音频转换成光谱，但是我找不到任何可以从视频转换为音频的人。

Answer 1

tf.contrib.ffmpeg.decode_audio() op可以将音频数据（包括WAV格式）加载到张量中，tf.contrib.ffmpeg.encode_audio()可以将其转换回音频数据。

input_filename = tf.placeholder(tf.string, shape=[])
output_filename = tf.placeholder(tf.string, shape=[])

input_signal = tf.contrib.ffmpeg.decode_audio(
    tf.read_file(input_filename), file_format="wav",
    samples_per_second=44100, channel_count=2)

# ...

output_signal = ...  # A 2-D tensor, [samples x channels]
encoded_audio_data = tf.contrib.ffmpeg.encode_audio(
    output_signal, file_format="wav", samples_per_second=44100)

write_file_op = tf.write_file(output_filename, encoded_audio_data)

with tf.Session() as sess:
  sess.run(write_file_op, {input_filename: "input.wav",
                           output_filename: "output.wav"})

从音频到张量，再到张量流中的音频

1 个答案: