
时间:2019-01-04 20:04:08

标签: text-to-speech microsoft-cognitive

只需检查以确保应支持此功能。 here页显示您应该能够使用至少16kHz的任何PCM文件。我正在尝试使用NAudio将较长的wav文件分割成语音,并且可以生成文件,但是我提交的所有训练数据都返回了处理错误,“仅接受RIFF(WAV)格式。音频文件的格式。”音频文件是16位PCM,单声道,44kHz wav文件,并且都在60s以下。我可能会丢失的文件格式还有其他限制吗? wav文件确实具有有效的RIFF头(已验证字节存在)。

1 个答案:

答案 0 :(得分:0)


string rawResult = ea.Result.ToString();  //can get access to raw value this way.
Regex r = new Regex(@".*Offset"":(\d*),.*");
UInt64 offset = Convert.ToUInt64(r?.Match(rawResult)?.Groups[1]?.Value);
r = new Regex(@".*Duration"":(\d*),.*");
UInt64 duration = Convert.ToUInt64(r?.Match(rawResult)?.Groups[1]?.Value);

//create segment files
File.AppendAllText($@"{path}\{fileName}\{fileName}.txt", $"{segmentNumber}\t{ea.Result.Text}\r\n");

//offset and duration are in 100ns units
WaveFileReader w = new WaveFileReader(v);
long totalDurationInMs = w.SampleCount / w.WaveFormat.SampleRate * 1000;  //total length of the file
ulong offsetInMs = offset / 10000;  //convert from 100ns intervals to ms
ulong durationInMs = duration / 10000;
long bytesPerMilliseconds = w.WaveFormat.AverageBytesPerSecond / 1000;
w.Position = bytesPerMilliseconds * (long)offsetInMs;
long bytesToRead = bytesPerMilliseconds * (long)durationInMs;
byte[] buffer = new byte[bytesToRead];
int bytesRead = w.Read(buffer, 0, (int)bytesToRead);
string wavFileName = $@"{path}\{fileName}\{segmentNumber}.wav";
string tempFileName = wavFileName + ".tmp";
WaveFileWriter wr = new WaveFileWriter(tempFileName, w.WaveFormat);
wr.Write(buffer, 0, bytesRead);

//this is probably really inefficient, but it's also the simplest way to get things in the right format.  It's a prototype-deal with it...
WaveFileReader r2 = new WaveFileReader(tempFileName);
//from other project
var desiredOutputFormat = new WaveFormat(16000, 16, 1);
using (var converter = new WaveFormatConversionStream(desiredOutputFormat, r2))
    WaveFileWriter.CreateWaveFile(wavFileName, converter);




此外,似乎16 KHz和44 KHz PCM都可以使用自定义语音,因此如果您有更高质量的音频可用,那么这是一个加分。