如何从歌曲中产生声音时间戳?

时间:2020-09-18 18:32:27

标签: python tensorflow machine-learning audio keras

我想建立一个预测何时播放的时间戳的模型,所以最终只使用这些声音复制了歌曲。

这是我的代码:

import tensorflow as tf
import tensorflow.keras.backend as K
import tensorflow_io as tfio
import tensorflow.keras as keras
import numpy as np

num_epochs = 1

sound_tensors = []
audiotensor = tfio.audio.AudioIOTensor('audio_files/audio.mp3').to_tensor()

tensor = tf.cast(audiotensor, tf.float32) / 44100.0
tensor = tf.squeeze(tf.split(tensor, [1,-1], axis=1)[0]) + 0.1


def get_loss(y_true, y_pred): # y_true = the song, y_pred = the timestamps
    l = y_pred.shape[0]
    max = K.argmax(y_pred, axis=0)
    y_comp = K.zeros(l)
    for i in range(l):
        if max > 0:
            sound_tensor = sound_tensors[max-1]
            y_comp = y_comp + sound_tensor
        
    return K.mean(tf.keras.backend.square(y_true - y_comp), axis=-1)



model = keras.Sequential([ # Model to train
    keras.layers.Dense(50, activation=tf.nn.relu, input_shape=(1,)),
    keras.layers.Dense(10, activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.relu)
])


model.compile(
    optimizer='adam', 
    loss=get_loss, 
    metrics=['accuracy']
)

plt.figure()

for epoch in range(num_epochs):
    n1 = np.random.randint(tensor.shape[0]) 
    n2 = 200
    x = tf.slice(tensor, [n1], [n2]) # pick random part of song for training
    y = np.copy(x)

    model.fit(x=x, y=y, batch_size=None, epochs=1000) # (y = y_true in get_loss)

在“ get_loss”函数中,我想使用时间戳(在y_pred中)和其他音频张量生成完成的歌曲,并将其与实际歌曲(在y_true中)进行比较,以计算损失,但是在我的代码中为示例还没有工作。

0 个答案:

没有答案