Question

基于此项目Tacotron TTS，问题也被提出here。

我一直在尝试制作自己的培训数据，但似乎没有大量关于如何格式化数据的资源。我比较了LJ001数据并试图模仿它，包括拆分wavs和transcript.csv。

我已经使用LJ001数据和培训师测试了train.py，但是当我尝试使用我的数据时，它失败了，给我这个错误：

Traceback (most recent call last):
   File "train.py", line 96, in <module>
g = Graph(); print("Training Graph loaded")
   File "train.py", line 33, in __init__
self.x, self.y, self.z, self.fnames, self.num_batch = get_batch()
   File "C:\Users\...\tacotron-master\data_load.py", line 116, in get_batch
dynamic_pad=True)
File "C:\anaconda3\envs\...\training\bucket_ops.py", line 374, in bucket_by_sequence_length
   raise ValueError("bucket_boundaries must not be empty")
ValueError: bucket_boundaries must not be empty

以下是CSV文件的示例，我尝试匹配ID，TEXT，LENGTH格式。

# Text Data Pulled from Youtube Subtitles and Formatted Manually
SM001-0001|Oh happy fourth of July America|00:00:02
SM001-0002|Ready to fire up the grill and celebrate our victory over the Brits|00:00:03
SM001-0003|Well, I'm not|00:00:01
. . .

所以提出两个问题：

1）为什么我收到这个bucket_boundaries不能为空当python找到CSV并且可以读取它时出错？

2）根据1的答案，我如何正确格式化数据以使用神经网络？

格式化自定义培训数据＆amp; ValueError：bucket_boundaries不能为空

0 个答案: