用python语音聊天

时间:2015-03-22 07:22:59

标签: python python-3.x numpy pyaudio

我用pyaudio创建了非常简单的语音聊天,但声音有点儿。你经常会听到像老电影那样的噪音。这可能是由于我通过UDP发送的丢失的语音CHUNK引起的。有可能以某种方式降低噪音吗? 此外,我想在用户移动按钮时发出声音,但由于某种原因,不可能合并这两个音轨(音效和声音)!

这是最重要的课程" Sound"。它在线程中运行,因此它可以永久循环运行。

import numpy as np
import pyaudio
import wave      # I play the buttons effects from wav file

class Sound():
    WIDTH = 2
    CHANNELS = 2
    RATE = 44100

    def __init__(self, parent = None):
        super(Sound, self).__init__(parent)
        self.voiceStreams= []
        self.effectStreams= []
        self.vVolume= 1
        self.eVolume= 0.5
        self.voip = None

        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(format = self.p.get_format_from_width(Sound.WIDTH),
                        channels = Sound.CHANNELS,
                        rate = Sound.RATE,
                        input = True,
                        output = True,
                        #stream_callback = self.callback
                        )


        self.nextSample = ""
        self.lastSample = ""
        self.stream.start_stream()

    def run(self):
        while True:
            self.myCallback()


    def myCallback(self):
        _time = time.clock()

        if self.nextSample:
            self.stream.write(self.nextSample) 
            self.lastSample = self.nextSample

        elif self.lastSample:   # I have got some crazy idea, that when there are no data (because UDP doesnt deliver them) I could play the last data, so nobody hears the short silence noise
            self.stream.write(self.lastSample)
            self.lastSample = ""

        _time = time.clock()
        #print ("{0:d}  ---- {1:d} --- timeWrite: {2:.1f}".format(len(self.voiceStreams), self.stream.get_read_available(), (time.clock() - _time)* 1000)   , end = "   ")

        if self.stream.get_read_available() > 1023:
            mic = self.stream.read(1024)
        else:
            mic = ""

        #print ("timeRead: {0:.1f}".format(  (time.clock() - _time)* 1000)  , end = "   ")

        if mic and self.voip: self.voip.sendDatagram(mic)             #This sends the CHUNK of sound to my UDP client
        _time = time.clock()


        data = np.zeros(2048, np.int64)

        length = len(self.voiceStreams)         # I read voice data
        l1 = length

        for i in range(length):
            s = self.voiceStreams.pop(0)
            data += s / length * self.vVolume * 0.4 # Here i merge multiple voices with numpy. I also reduce the volume of each voice based on how voices I have...
        length = len(self.effectStreams)        
        toPop= []                               # Here i hold indexes of effects which ended playing

        for i in range(length):
            s = self.effectStreams[i].readframes(1024)
            if s == "":                         # If there are no data to play
                toPop.append(i - len(toPop))
            else:
                d = np.fromstring(s, np.int16)
                # Sadly enough each numpy must have same length, so if I get to the end of track, which has only length of 1500 I must throw that away, because numpy doesnt allow me to merge it with array of length 2048
                if len(d) > 2047:               # And again I merge the sounds with numpy and I reduce the volume
                    data += (d/ length * length)  * self.eVolume * 0.3
        for i in toPop:     # If I am at the end of track, I delete it
            del self.effectStreams[i]


        if np.any(data):        # If there are any data to read
            self.nextSample = data.astype(np.int16).tostring()  #I prepare the next CHUNK (should be 20 ms, but I am not sure)
        else:
            self.nextSample = ""
        #print ("timeRest: {0:.1f}".format(  (time.clock() - _time)* 1000), end = "    ||  ")
        print("HOW MANY CHUNKS OF VOICE I GOT: ", l1)
        # It is weird, that when i print the times of reading and writing to stream, it usually prints something like this: (20ms, 20ms, 30ms, 20ms, 20ms, 30ms, 20ms ...)


    def close(self):
        self.timer.stop()
        self.stream.stop_stream()
        self.stream.close()
        self.p.terminate()

UDP服务器和客户端非常简单(它们运行良好,所以我不在这里发布)。客户端只需将所有数据发送到服务器,服务器就会将所有数据发送到所有客户端。我不告诉任何人,谁发送数据。这意味着如果数据传送得太晚,我将同时从一个客户端播放两个CHUNKS(因为我认为它们来自多个客户端)!

以下是wav文件:Dropbox repository 我没有创建它们,我是从网站http://www.freesound.org/people/ERH/sounds/31135/下载的,它们是根据归因法获得许可的。

!!我还添加了" OUTPUT.txt"进入dropbox文件,其中显示了在两个人之间运行此示例时python打印出来的内容(我只从一个用户获取语音数据)。

感谢您的任何建议。

0 个答案:

没有答案