我是深度学习的新手,我正在使用tensorflow API,LSTM模型和ctc损失函数构建基本的端到端语音识别器。我已将音频功能提取到mfccs。我真的不知道如何将音频映射到转录,我知道ctc是用于此目的的,我知道ctc的工作原理,但不知道实现它的代码。
这是我提取特征的代码
import os
import numpy as np
import glob
import scipy.io.wavfile as wav
from python_speech_features import mfcc, logfbank
# Read the input audio file
for f in glob.glob('Downloads/DataVoices/Training/**/*.wav', recursive=True):
(rate,sig) = wav.read(f)
sig = sig.astype(np.float64)
# Take the first 10,000 samples for analysis
#sig = sig[:10000]
mfcc_feat = mfcc(sig,rate,winlen=0.025, winstep=0.01,
numcep=13, nfilt=26, nfft=512, lowfreq=0, highfreq=None,
preemph=0.97, ceplifter=22, appendEnergy=True)
fbank_feat = logfbank(sig, rate)
acoustic_features = np.concatenate((mfcc_feat, fbank_feat), axis=1) # time_stamp x n_features
print(acoustic_features)
我还制作了一个training list.txt文件,其中提供了带有音频路径的转录,例如:
这是example / 001 / 001.wav
这是example / 001/001(1).wav
其中001是文件夹,而001.wav和0001(1).wav是两个发声的波形文件。
答案 0 :(得分:0)
我将其发布为人为示例,假设这将为如何读取CSV文件和CSV文件名提供一个思路。您可以根据自己的需要进行修改。
假设我有此CSV文件。第一列是您的成绩单。文件路径是您的音频文件。在我的情况下,这只是一个带有随机文本的文本文件。
Script1,D:/PycharmProjects/TensorFlow/script1.txt
Script2,D:/PycharmProjects/TensorFlow/script2.txt
这是我用来测试的代码。请记住,这是一个示例。
import tensorflow as tf
batch_size = 1
record_defaults = [ ['Test'],['D:/PycharmProjects/TensorFlow/script1.txt']]
def readbatch(data_queue) :
reader = tf.TextLineReader()
_, rows = reader.read_up_to(data_queue, batch_size)
transcript,wav_filename = tf.decode_csv(rows, record_defaults,field_delim=",")
audioreader = tf.WholeFileReader()
print(wav_filename)
_, audio = audioreader.read( tf.train.string_input_producer(wav_filename) )
return [audio,transcript]
data_queue = tf.train.string_input_producer(['D:\\PycharmProjects\\TensorFlow\\script.csv'], shuffle=False)
batch_data = readbatch(data_queue)
batch_values = tf.train.batch(batch_data, shapes=[tf.TensorShape(()),tf.TensorShape(batch_size,)], batch_size=batch_size, enqueue_many=False )
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
sess.run(tf.initialize_local_variables())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
try:
step = 0
while not coord.should_stop():
step += 1
feat = sess.run([batch_values])
audio = feat[0][0]
print(audio)
script = feat[0][1]
print(script)
except tf.errors.OutOfRangeError:
print(' training for 1 epochs, %d steps', step)
finally:
coord.request_stop()
coord.join(threads)