Question

我有音频文件，我想从它们的音频内容中制作一个 tf.DataSet（即数据集中的每个音频文件都应表示为浮点值向量）

这是我的代码

def convert_audio_file_to_numpy_array(filepath):
  sample_rate = sox.file_info.sample_rate(filepath)
  audio, sr = librosa.load(filepath, sr=sample_rate)
  array = np.asarray(audio)
  return array

filenames_ds = tf.data.Dataset.from_tensor_slices(input_filepaths)
waveforms_ds = filenames_ds.map(convert_audio_file_to_numpy_array, num_parallel_calls=tf.data.AUTOTUNE)

这会产生以下错误：TypeError: stat: path should be string, bytes, os.PathLike or integer, not Tensor

我正在按照 this official tutorial 中的模式使用 DataSet 的 map 函数（请参阅对 files_ds.map 的调用）。其中，map 使用的函数采用文件路径。

我在做什么与官方教程不同？

Answer 1

问题是函数 def sample_rate(input_filepath: Union[str, Path]) -> float: 需要 string 或 pathlib.Path，而您提供的是 Tensor。（您的 filename_ds 的元素是字符串类型的张量）。

在 tensorflow 教程中，他们使用需要 Tensor 类型字符串的 tensorflow 函数加载数据。您应该检查是否可以使用 tf.audio 本机函数加载文件。

否则，常见的解决方法是使用带有 tf.data.Dataset.from_generator 的生成器，类似于以下解决方案：

def generator_func(list_of_path):
  
  def convert_audio_file_to_numpy_array(filepath):
    sample_rate = sox.file_info.sample_rate(filepath)
    audio, sr = librosa.load(filepath, sr=sample_rate)
    array = np.asarray(audio)
    return array

  for path in list_of_path:
    yield convert_audio_file_to_numpy_array(path)

ds = tf.data.Dataset.from_generator(generator_func, output_types=tf.float32)

使用 tf.data.Dataset.map 的正确方法是什么？

1 个答案: