Question

我正在编写一个使用SpeechSynthesizer根据请求生成wave文件的东西，但我遇到了噼里啪啦的噪音问题。奇怪的是直接输出到声卡就好了。

这个简短的PowerShell脚本演示了这个问题，尽管我用C＃编写了我的程序。

Add-Type -AssemblyName System.Speech
$speech = New-Object System.Speech.Synthesis.SpeechSynthesizer
$speech.Speak('Guybrush Threepwood, mighty pirate!')
$speech.SetOutputToWaveFile("${PSScriptRoot}\foo.wav")
$speech.Speak('Guybrush Threepwood, mighty pirate!')

这个应该做什么，输出到扬声器，然后保存与＃34; foo.wav＆＃34;相同的声音。在脚本旁边。

它的作用是输出到扬声器，然后将噼里啪啦的旧唱机播放器保存为波形文件。我已经在三台不同的机器上对它进行了测试，虽然默认情况下它们选择了不同的声音（所有微软都提供了默认声音），但它们听起来像垃圾在波形文件中下楼梯。

为什么？

编辑：我正在Windows 10专业版上进行测试，最新的更新增加了令人讨厌的问题＆＃34; People＆＃34;任务栏上的按钮。

编辑2：Here's a link to an example sound generated with the above script. Notice the crackling voice, that's not there when the script outputs directly to the speakers.

编辑3：It's even more noticeable with a female voice

编辑4：The same voice as above, saved to file with TextAloud 3 - no cracking, no vertical spikes.

Answer 1

这是SpeechSynthesizer API的一个问题，它只是提供质量差，噼里啪啦的音频，如上面的示例所示。解决方案是执行TextAloud所做的事情，即直接使用SpeechLib COM对象。

这是通过向“Microsoft Speech Object Library（5.4）”添加COM引用来完成的。这是我最终得到的代码片段，它产生与TextAloud相同质量的音频片段：

public new static byte[] GetSound(Order o)
{
    const SpeechVoiceSpeakFlags speechFlags = SpeechVoiceSpeakFlags.SVSFlagsAsync;
    var synth = new SpVoice();
    var wave = new SpMemoryStream();
    var voices = synth.GetVoices();
    try
    {
        // synth setup
        synth.Volume = Math.Max(1, Math.Min(100, o.Volume ?? 100));
        synth.Rate = Math.Max(-10, Math.Min(10, o.Rate ?? 0));
        foreach (SpObjectToken voice in voices)
        {
            if (voice.GetAttribute("Name") == o.Voice.Name)
            {
                synth.Voice = voice;
            }
        }
        wave.Format.Type = SpeechAudioFormatType.SAFT22kHz16BitMono;
        synth.AudioOutputStream = wave;
        synth.Speak(o.Text, speechFlags);
        synth.WaitUntilDone(Timeout.Infinite);

        var waveFormat = new WaveFormat(22050, 16, 1);
        using (var ms = new MemoryStream((byte[])wave.GetData()))
        using (var reader = new RawSourceWaveStream(ms, waveFormat))
        using (var outStream = new MemoryStream())
        using (var writer = new WaveFileWriter(outStream, waveFormat))
        {
            reader.CopyTo(writer);
            return o.Mp3 ? ConvertToMp3(outStream) : outStream.GetBuffer();
        }
    }
    finally
    {
        Marshal.ReleaseComObject(voices);
        Marshal.ReleaseComObject(wave);
        Marshal.ReleaseComObject(synth);
    }
}

这是将wave文件转换为mp3的代码。它使用来自nuget的NAudio.Lame。

internal static byte[] ConvertToMp3(Stream wave)
{
    wave.Position = 0;
    using (var mp3 = new MemoryStream())
    using (var reader = new WaveFileReader(wave))
    using (var writer = new LameMP3FileWriter(mp3, reader.WaveFormat, 128))
    {
        reader.CopyTo(writer);
        mp3.Position = 0;
        return mp3.ToArray();
    }
}

Microsoft SpeechSynthesizer在输出到文件和流时会出现问题

1 个答案: