Question

早上好，我正在尝试通过wifi将连接到ESP32板上的麦克风的音频数据发送到运行一些Java代码的台式机。如果我使用Java的AudioSystems库运行音频数据，则它有点静态，但清晰易读。切换为使用Sphinx-4库，该库将音频转换为文本，但它有时只能识别单词。

这是我第一次不得不处理原始音频数据，因此甚至可能无法实现，因为电路板最多只能读取12位信号，这意味着要转换16位，每个12位值映射到15个16位值。这也可能是由于将样本降低到16kHz大约需要115微秒

如何使音频播放足够平滑，以使Sphinx4库可以轻松识别它？当前的实现有很小的中断，并且我认为有些不合时宜

ESP32代码：

BUFFERMAX = 8000
ONE_SECOND = 1000000
int writeBuffer[BUFFERMAX];
void writeAudio(){
  for(int i=0; i< BUFFERMAX;i=i+1){

  //data read in is 12 bits so I mapped the value to 16 bits ( 2 bytes)
  sensorValue = (map(analogRead(sensorPin), 0, 4096, -32000, 32000));

  //none to minimal sound is around -7000 so try to zero out additional noise with average
  int prevAvg = avg;
  avg = (avg + sensorValue)/2;
  sensorValue = (abs(prevAvg) + sensorValue);
  if(abs(sensorValue) < 1000){sensorValue = 0;}

  writeBuffer[i] = ((sensorValue));
  // delay so that 8000 INTs (16000 bytes) takes one second to record
  delayMicroseconds(delayMicro);
}
client.write((byte*)writeBuffer, sizeof(writeBuffer));

Java Sphinx：

StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.

recognizer.startRecognition(socket.getInputStream() );

System.out.print("awaiting command...");
SpeechResult result = recognizer.getResult();
System.out.println(result.getHypothesis().toLowerCase());

Java播放音频：

private static void init() throws LineUnavailableException {
    //  specifying the audio format
    AudioFormat _format = new AudioFormat(16000.F,// Sample Rate
            16,     // Size of SampleBits
            1,      // Number of Channels
            true,   // Is Signed?
            false   // Is Big Endian?
    );

    //  creating the DataLine Info for the speaker format
    DataLine.Info speakerInfo = new DataLine.Info(SourceDataLine.class, _format);

    //  getting the mixer for the speaker
    _speaker = (SourceDataLine) AudioSystem.getLine(speakerInfo);
    _speaker.open(_format);
}


_streamIn = socket.getInputStream();

_speaker.start();

byte[] data = new byte[16000];
System.out.println("Waiting for data...");
while (_running) {
    long start = new Date().getTime();

    //  checking if the data is available to speak
    if (_streamIn.available() <= 0)
        continue;   //  data not available so continue back to start of loop

    //  count of the data bytes read
    int readCount= _streamIn.read(data, 0, data.length);

    if(readCount > 0 && (readCount%2) == 0){
        System.out.println(readCount);
        _speaker.write(data, 0, readCount);
        readCount=0;
    }
    System.out.println("Time: " + (new Date().getTime() - start));
}

平滑Sphinx库的微控制器音频

0 个答案: