我正试图让AudioKit
将麦克风传输到Google的语音转文本API,如here所示,但我不完全确定该如何解决。
要准备语音到文本引擎的音频,您需要设置编码并将其作为块传递。在Google使用的示例中,他们使用Apple的AVFoundation
,但我想使用AudioKit,因此我可以执行一些预处理,例如削减低幅度等。
我认为正确的方法是使用Tap
:
首先,我应该按以下格式匹配:
var asbd = AudioStreamBasicDescription()
asbd.mSampleRate = 16000.0
asbd.mFormatID = kAudioFormatLinearPCM
asbd.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
asbd.mBytesPerPacket = 2
asbd.mFramesPerPacket = 1
asbd.mBytesPerFrame = 2
asbd.mChannelsPerFrame = 1
asbd.mBitsPerChannel = 16
AudioKit.format = AVAudioFormat(streamDescription: &asbd)!
然后创建一个点击,例如:
open class TestTap {
internal let bufferSize: UInt32 = 1_024
@objc public init(_ input: AKNode?) {
input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in
// do work here
}
}
}
但我无法确定通过streamAudioData
方法与AudioKit
实时处理将此数据发送到Google语音转文本API的正确方法,但也许我这是错误的方式吗?
更新
我已经创建了Tap
:
open class TestTap {
internal var audioData = NSMutableData()
internal let bufferSize: UInt32 = 1_024
func toData(buffer: AVAudioPCMBuffer) -> NSData {
let channelCount = 2 // given PCMBuffer channel count is
let channels = UnsafeBufferPointer(start: buffer.floatChannelData, count: channelCount)
return NSData(bytes: channels[0], length:Int(buffer.frameCapacity * buffer.format.streamDescription.pointee.mBytesPerFrame))
}
@objc public init(_ input: AKNode?) {
input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in
self.audioData.append(self.toData(buffer: buffer) as Data)
// We recommend sending samples in 100ms chunks (from Google)
let chunkSize: Int /* bytes/chunk */ = Int(0.1 /* seconds/chunk */
* AudioKit.format.sampleRate /* samples/second */
* 2 /* bytes/sample */ )
if self.audioData.length > chunkSize {
SpeechRecognitionService
.sharedInstance
.streamAudioData(self.audioData,
completion: { response, error in
if let error = error {
print("ERROR: \(error.localizedDescription)")
SpeechRecognitionService.sharedInstance.stopStreaming()
} else if let response = response {
print(response)
}
})
self.audioData = NSMutableData()
}
}
}
}
并且在viewDidLoad:
中,我正在设置AudioKit:
AKSettings.sampleRate = 16_000
AKSettings.bufferLength = .shortest
然而,Google抱怨:
ERROR: Audio data is being streamed too fast. Please stream audio data approximately at real time.
我已经尝试更改多个参数,例如块大小,但无济于事。
答案 0 :(得分:4)
我找到了解决方案here。
我的Tap
的最终代码是:
open class GoogleSpeechToTextStreamingTap {
internal var converter: AVAudioConverter!
@objc public init(_ input: AKNode?, sampleRate: Double = 16000.0) {
let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: sampleRate, channels: 1, interleaved: false)!
self.converter = AVAudioConverter(from: AudioKit.format, to: format)
self.converter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Normal
self.converter?.sampleRateConverterQuality = .max
let sampleRateRatio = AKSettings.sampleRate / sampleRate
let inputBufferSize = 4410 // 100ms of 44.1K = 4410 samples.
input?.avAudioNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(inputBufferSize), format: nil) { buffer, time in
let capacity = Int(Double(buffer.frameCapacity) / sampleRateRatio)
let bufferPCM16 = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(capacity))!
var error: NSError? = nil
self.converter?.convert(to: bufferPCM16, error: &error) { inNumPackets, outStatus in
outStatus.pointee = AVAudioConverterInputStatus.haveData
return buffer
}
let channel = UnsafeBufferPointer(start: bufferPCM16.int16ChannelData!, count: 1)
let data = Data(bytes: channel[0], count: capacity * 2)
SpeechRecognitionService
.sharedInstance
.streamAudioData(data,
completion: { response, error in
if let error = error {
print("ERROR: \(error.localizedDescription)")
SpeechRecognitionService.sharedInstance.stopStreaming()
} else if let response = response {
print(response)
}
})
}
}
答案 1 :(得分:3)
您可以使用AKNodeRecorder进行录制,并将缓冲区从生成的AKAudioFile传递到API。如果您想要更多实时,可以尝试在要记录的AKNode的avAudioNode属性上安装一个tap,并将缓冲区连续传递给API。
但是,我很好奇为什么你认为需要进行预处理 - 我确信Google API已经针对您提到的示例代码生成的录制进行了大量优化。
我在iOS Speech API上获得了很多成功/乐趣。不确定您是否有理由使用Google API,但我会考虑检查它,看看它是否可以更好地满足您的需求,如果您还没有。
希望这有帮助!