通过XAudio2播放窦

时间:2012-09-04 07:15:36

标签: c# audio slimdx xaudio2

我正在使用XAudio2制作音频播放器。我们以640字节的数据包流式传输数据,采样率为8000Hz,采样深度为16字节。我们正在使用SlimDX来访问XAudio2。

但是在播放声音时,我们注意到声音质量很差。例如,这是一个3KHz的正弦曲线,用Audacity捕获。 3KHz sinus curve

我已经将音频播放器浓缩为基础,但音频质量仍然很差。这是XAudio2,SlimDX或我的代码中的错误,还是仅仅是一个从8KHz到44.1KHz的神器?最后一个似乎是不合理的,因为我们还生成了由Windows Media Player完美播放的PCM wav文件。

以下是基本实现,它生成断开的正弦。

public partial class MainWindow : Window
{
    private XAudio2 device = new XAudio2();
    private WaveFormatExtensible format = new WaveFormatExtensible();
    private SourceVoice sourceVoice = null;
    private MasteringVoice masteringVoice = null;
    private Guid KSDATAFORMAT_SUBTYPE_PCM = new Guid("00000001-0000-0010-8000-00aa00389b71");
    private AutoResetEvent BufferReady = new AutoResetEvent(false);

    private PlayBufferPool PlayBuffers = new PlayBufferPool();

    public MainWindow()
    {
        InitializeComponent();

        Closing += OnClosing;

        format.Channels = 1;
        format.BitsPerSample = 16;
        format.FormatTag = WaveFormatTag.Extensible;
        format.BlockAlignment = (short)(format.Channels * (format.BitsPerSample / 8));
        format.SamplesPerSecond = 8000;
        format.AverageBytesPerSecond = format.SamplesPerSecond * format.BlockAlignment;
        format.SubFormat = KSDATAFORMAT_SUBTYPE_PCM;
    }

    private void OnClosing(object sender, CancelEventArgs cancelEventArgs)
    {
        sourceVoice.Stop();
        sourceVoice.Dispose();
        masteringVoice.Dispose();

        PlayBuffers.Dispose();
    }

    private void button_Click(object sender, RoutedEventArgs e)
    {
        masteringVoice = new MasteringVoice(device);

        PlayBuffer buffer = PlayBuffers.NextBuffer();

        GenerateSine(buffer.Buffer);
        buffer.AudioBuffer.AudioBytes = 640;

        sourceVoice = new SourceVoice(device, format, VoiceFlags.None, 8);
        sourceVoice.BufferStart += new EventHandler<ContextEventArgs>(sourceVoice_BufferStart);
        sourceVoice.BufferEnd += new EventHandler<ContextEventArgs>(sourceVoice_BufferEnd);

        sourceVoice.SubmitSourceBuffer(buffer.AudioBuffer);

        sourceVoice.Start();
    }

    private void sourceVoice_BufferEnd(object sender, ContextEventArgs e)
    {
        BufferReady.Set();
    }

    private void sourceVoice_BufferStart(object sender, ContextEventArgs e)
    {
        BufferReady.WaitOne(1000);

        PlayBuffer nextBuffer = PlayBuffers.NextBuffer();
        nextBuffer.DataStream.Position = 0;
        nextBuffer.AudioBuffer.AudioBytes = 640;
        GenerateSine(nextBuffer.Buffer);

        Result r = sourceVoice.SubmitSourceBuffer(nextBuffer.AudioBuffer);
    }

    private void GenerateSine(byte[] buffer)
    {
        double sampleRate = 8000.0;
        double amplitude = 0.25 * short.MaxValue;
        double frequency = 3000.0;
        for (int n = 0; n < buffer.Length / 2; n++)
        {
            short[] s = { (short)(amplitude * Math.Sin((2 * Math.PI * n * frequency) / sampleRate)) };
            Buffer.BlockCopy(s, 0, buffer, n * 2, 2);
        }
    }
}

public class PlayBuffer : IDisposable
{
    #region Private variables
    private IntPtr BufferPtr;
    private GCHandle BufferHandle;
    #endregion

    #region Constructors
    public PlayBuffer()
    {
        Index = 0;
        Buffer = new byte[640 * 4]; // 640 = 30ms
        BufferHandle = GCHandle.Alloc(this.Buffer, GCHandleType.Pinned);
        BufferPtr = new IntPtr(BufferHandle.AddrOfPinnedObject().ToInt32());

        DataStream = new DataStream(BufferPtr, 640 * 4, true, false);
        AudioBuffer = new AudioBuffer();
        AudioBuffer.AudioData = DataStream;
    }

    public PlayBuffer(int index)
        : this()
    {
        Index = index;
    }
    #endregion

    #region Destructor
    ~PlayBuffer()
    {
        Dispose();
    }
    #endregion

    #region Properties
    protected int Index { get; private set; }
    public byte[] Buffer { get; private set; }
    public DataStream DataStream { get; private set; }
    public AudioBuffer AudioBuffer { get; private set; }
    #endregion

    #region Public functions
    public void Dispose()
    {
        if (AudioBuffer != null)
        {
            AudioBuffer.Dispose();
            AudioBuffer = null;
        }

        if (DataStream != null)
        {
            DataStream.Dispose();
            DataStream = null;
        }
    }
    #endregion
}

public class PlayBufferPool : IDisposable
{
    #region Private variables
    private int _currentIndex = -1;
    private PlayBuffer[] _buffers = new PlayBuffer[2];
    #endregion

    #region Constructors
    public PlayBufferPool()
    {
        for (int i = 0; i < 2; i++)
            Buffers[i] = new PlayBuffer(i);
    }
    #endregion

    #region Desctructor
    ~PlayBufferPool()
    {
        Dispose();
    }
    #endregion

    #region Properties
    protected int CurrentIndex
    {
        get { return _currentIndex; }
        set { _currentIndex = value; }
    }

    protected PlayBuffer[] Buffers
    {
        get { return _buffers; }
        set { _buffers = value; }
    }
    #endregion

    #region Public functions
    public void Dispose()
    {
        for (int i = 0; i < Buffers.Length; i++)
        {
            if (Buffers[i] == null)
                continue;

            Buffers[i].Dispose();
            Buffers[i] = null;
        }
    }

    public PlayBuffer NextBuffer()
    {
        CurrentIndex = (CurrentIndex + 1) % Buffers.Length;
        return Buffers[CurrentIndex];
    }
    #endregion
}

一些额外的细节:

这用于通过ALAW,μLAW或TrueSpeech等各种压缩方式重放录制的语音。数据以小包发送,解码并发送给该播放器。这就是为什么我们使用如此低的采样率和如此小的缓冲区的原因。 但是,我们的数据没有问题,因为生成带有数据的WAV文件会导致WMP或VLC完美重放。

编辑:我们现在通过重写NAudio中的播放器来“解决”这个问题。 我仍然对这里发生的事情的任何输入感兴趣。它是我们在PlayBuffers中的方法,还是仅仅是DirectX或包装器中的错误/限制?我尝试使用SharpDX而不是SlimDX,但这并没有改变结果。

2 个答案:

答案 0 :(得分:2)

看起来好像没有适当的抗锯齿(重建)滤镜就完成了上采样。截止频率太高(高于原始奈奎斯特频率),因此保留了很多别名,导致输出类似于以8000 Hz采样的分段线性插值。

虽然你所有不同的选择都在从8kHz到44.1kHz进行上变频,但是它们的方式很重要,而且一个库做得好的事实并不能证明上变换不是误差源。另一个。

答案 1 :(得分:0)

自从我使用声音和频率以来已经有一段时间了,但这是我记得的:你的采样率为8000Hz,并且想要一个3000Hz的正弦频率。所以1秒钟你有8000个样本,在那一秒你想要你的正弦振荡3000次。这低于奈奎斯特频率(采样率的一半),但几乎没有(见Nyquist–Shannon sampling theorem)。所以我不希望这里的质量很好。

事实上:逐步执行GenerateSine - 方法,您会看到s[0]将包含值0,5792,-8191,5792,0,-5792,8191,-5792, 0,5792 ......

尽管如此,这并没有解释你记录下来的奇怪的正弦,我不确定人耳需要多少样本才能听到“好”的正弦波。