我正在努力编码从AVCaptureSession
接收的音频缓冲区
AudioConverter
然后将其追加到AVAssetWriter
。
我没有收到任何错误(包括OSStatus响应),以及
生成的CMSampleBuffer
似乎具有有效数据,但是生成的文件
根本就没有任何可播放的音频。与视频一起写视频时
帧停止在(appendSampleBuffer()
中添加几帧
返回false,但没有AVAssetWriter.error
),可能是因为资产
作家正在等待音频赶上来。我怀疑它与方式有关
我正在为AAC做准备。
该应用程序使用RxSwift,但我删除了RxSwift部件,因此更容易 了解更广泛的受众。
请查看以下代码中的评论,了解更多...评论
给定设置结构:
import Foundation
import AVFoundation
import CleanroomLogger
public struct AVSettings {
let orientation: AVCaptureVideoOrientation = .Portrait
let sessionPreset = AVCaptureSessionPreset1280x720
let videoBitrate: Int = 2_000_000
let videoExpectedFrameRate: Int = 30
let videoMaxKeyFrameInterval: Int = 60
let audioBitrate: Int = 32 * 1024
/// Settings that are `0` means variable rate.
/// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time
/// to values based on the input stream.
let audioOutputABSD = AudioStreamBasicDescription(
mSampleRate: AVAudioSession.sharedInstance().sampleRate,
mFormatID: kAudioFormatMPEG4AAC,
mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue),
mBytesPerPacket: 0,
mFramesPerPacket: 1024,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0)
let audioEncoderClassDescriptions = [
AudioClassDescription(
mType: kAudioEncoderComponentType,
mSubType: kAudioFormatMPEG4AAC,
mManufacturer: kAppleSoftwareAudioCodecManufacturer) ]
}
一些辅助函数:
public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) {
switch (settings.sessionPreset, settings.orientation) {
case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920)
case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280)
default: fatalError("Unsupported session preset and orientation")
}
}
public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription {
var result = noErr
var absd = settings.audioOutputABSD
var description: CMAudioFormatDescription?
withUnsafePointer(&absd) { absdPtr in
result = CMAudioFormatDescriptionCreate(nil,
absdPtr,
0, nil,
0, nil,
nil,
&description)
}
if result != noErr {
Log.error?.message("Could not create audio format description")
}
return description!
}
public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription {
var result = noErr
var description: CMVideoFormatDescription?
let (width, height) = getVideoDimensions(fromSettings: settings)
result = CMVideoFormatDescriptionCreate(nil,
kCMVideoCodecType_H264,
Int32(width),
Int32(height),
[:],
&description)
if result != noErr {
Log.error?.message("Could not create video format description")
}
return description!
}
这是资产编写者的初始化方式:
guard let audioDevice = defaultAudioDevice() else
{ throw RecordError.MissingDeviceFeature("Microphone") }
guard let videoDevice = defaultVideoDevice(.Back) else
{ throw RecordError.MissingDeviceFeature("Camera") }
let videoInput = try AVCaptureDeviceInput(device: videoDevice)
let audioInput = try AVCaptureDeviceInput(device: audioDevice)
let videoFormatHint = createVideoFormatDescription(fromSettings: settings)
let audioFormatHint = createAudioFormatDescription(fromSettings: settings)
let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo,
outputSettings: nil,
sourceFormatHint: videoFormatHint)
let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio,
outputSettings: nil,
sourceFormatHint: audioFormatHint)
writerVideoInput.expectsMediaDataInRealTime = true
writerAudioInput.expectsMediaDataInRealTime = true
let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true)
.URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString)
.URLByAppendingPathExtension("mp4")
let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4)
if !assetWriter.canAddInput(writerVideoInput) {
throw RecordError.Unknown("Could not add video input") }
if !assetWriter.canAddInput(writerAudioInput) {
throw RecordError.Unknown("Could not add audio input") }
assetWriter.addInput(writerVideoInput)
assetWriter.addInput(writerAudioInput)
这就是音频样本的编码方式,问题区域最有可能 在这附近。我已经重写了这个,所以它不使用任何Rx-isms。
var outputABSD = settings.audioOutputABSD
var outputFormatDescription: CMAudioFormatDescription! = nil
CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription)
var converter: AudioConverter?
// Indicates whether priming information has been attached to the first buffer
var primed = false
func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? {
// Create the audio converter if it's not available
if converter == nil {
var classDescriptions = settings.audioEncoderClassDescriptions
var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory
var outputABSD = settings.audioOutputABSD
outputABSD.mSampleRate = inputABSD.mSampleRate
outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame
var converter: AudioConverterRef = nil
var result = noErr
result = withUnsafePointer(&outputABSD) { outputABSDPtr in
return withUnsafePointer(&inputABSD) { inputABSDPtr in
return AudioConverterNewSpecific(inputABSDPtr,
outputABSDPtr,
UInt32(classDescriptions.count),
&classDescriptions,
&converter)
}
}
if result != noErr { throw RecordError.Unknown }
// At this point I made an attempt to retrieve priming info from
// the audio converter assuming that it will give me back default values
// I can use, but ended up with `nil`
var primeInfo: AudioConverterPrimeInfo? = nil
var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo))
// The following returns a `noErr` but `primeInfo` is still `nil``
AudioConverterGetProperty(converter,
kAudioConverterPrimeInfo,
&primeInfoSize,
&primeInfo)
// I've also tried to set `kAudioConverterPrimeInfo` so that it knows
// the leading frames that are being primed, but the set didn't seem to work
// (`noErr` but getting the property afterwards still returned `nil`)
}
let converter = converter!
// Need to give a big enough output buffer.
// The assumption is that it will always be <= to the input size
let numSamples = CMSampleBufferGetNumSamples(buffer)
// This becomes 1024 * 2 = 2048
let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket)
let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize)
defer {
outputBufferPtr.destroy()
outputBufferPtr.dealloc(1)
}
var result = noErr
var outputPacketCount = UInt32(1)
var outputData = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: outputABSD.mChannelsPerFrame,
mDataByteSize: UInt32(outputBufferSize),
mData: outputBufferPtr))
// See below for `EncodeAudioUserData`
var userData = EncodeAudioUserData(inputSampleBuffer: buffer,
inputBytesPerPacket: inputABSD.mBytesPerPacket)
withUnsafeMutablePointer(&userData) { userDataPtr in
// See below for `fetchAudioProc`
result = AudioConverterFillComplexBuffer(
converter,
fetchAudioProc,
userDataPtr,
&outputPacketCount,
&outputData,
nil)
}
if result != noErr {
Log.error?.message("Error while trying to encode audio buffer, code: \(result)")
return nil
}
// See below for `CMSampleBufferCreateCopy`
guard let newBuffer = CMSampleBufferCreateCopy(buffer,
fromAudioBufferList: &outputData,
newFromatDescription: outputFormatDescription) else {
Log.error?.message("Could not create sample buffer from audio buffer list")
return nil
}
if !primed {
primed = true
// Simply picked 2112 samples based on convention, is there a better way to determine this?
let samplesToPrime: Int64 = 2112
let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate)
let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond)
// Without setting the attachment the asset writer will complain about the
// first buffer missing the `TrimDurationAtStart` attachment, is there are way
// to infer the value from the given `AudioBufferList`?
CMSetAttachment(newBuffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, nil),
kCMAttachmentMode_ShouldNotPropagate)
}
return newBuffer
}
下面是为音频转换器和数据提取样本的proc 传递给它的结构:
private class EncodeAudioUserData {
var inputSampleBuffer: CMSampleBuffer?
var inputBytesPerPacket: UInt32
init(inputSampleBuffer: CMSampleBuffer,
inputBytesPerPacket: UInt32) {
self.inputSampleBuffer = inputSampleBuffer
self.inputBytesPerPacket = inputBytesPerPacket
}
}
private let fetchAudioProc: AudioConverterComplexInputDataProc = {
(inAudioConverter,
ioDataPacketCount,
ioData,
outDataPacketDescriptionPtrPtr,
inUserData) in
var result = noErr
if ioDataPacketCount.memory == 0 { return noErr }
let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory
// If its already been processed
guard let buffer = userData.inputSampleBuffer else {
ioDataPacketCount.memory = 0
return -1
}
var inputBlockBuffer: CMBlockBuffer?
var inputBufferList = AudioBufferList()
result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
buffer,
nil,
&inputBufferList,
sizeof(AudioBufferList),
nil,
nil,
0,
&inputBlockBuffer)
if result != noErr {
Log.error?.message("Error while trying to retrieve buffer list, code: \(result)")
ioDataPacketCount.memory = 0
return result
}
let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket
ioDataPacketCount.memory = packetsCount
ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels
ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize
ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData
if outDataPacketDescriptionPtrPtr != nil {
outDataPacketDescriptionPtrPtr.memory = nil
}
return noErr
}
这就是我将AudioBufferList
转换为CMSampleBuffer
的方式:
public func CMSampleBufferCreateCopy(
buffer: CMSampleBuffer,
inout fromAudioBufferList bufferList: AudioBufferList,
newFromatDescription formatDescription: CMFormatDescription? = nil)
-> CMSampleBuffer? {
var result = noErr
var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)]
// Copy timing info from the previous buffer
var timingInfo = CMSampleTimingInfo()
result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo)
if result != noErr { return nil }
var newBuffer: CMSampleBuffer?
result = CMSampleBufferCreateReady(
kCFAllocatorDefault,
nil,
formatDescription ?? CMSampleBufferGetFormatDescription(buffer),
Int(bufferList.mNumberBuffers),
1, &timingInfo,
1, &sizeArray,
&newBuffer)
if result != noErr { return nil }
guard let b = newBuffer else { return nil }
CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList)
return newBuffer
}
有什么我显然做错了吗?有没有正确的方法
从CMSampleBuffer
构建AudioBufferList
?你如何转移启动
从转换器到您创建的CMSampleBuffer
的信息?
对于我的用例,我需要手动进行编码,因为缓冲区将是 在管道中进一步操纵(虽然我已经禁用了所有 编码后的转换,以确保它有效。)
任何帮助都会 赞赏。对不起,有这么多代码 摘要,但我想提供尽可能多的上下文。
提前致谢:)
一些相关问题:
我使用的一些参考文献:
答案 0 :(得分:6)
事实证明,我做错了很多事情。我没有张贴大量的代码,而是尝试将其组织成我发现的一小部分东西。
这对我来说是一个巨大的混乱来源:
CMSampleBuffer
可以有一个或多个样本缓冲区(通过CMSampleBufferGetNumSamples
发现)CMSampleBuffer
代表一个音频数据包。CMSampleBufferGetNumSamples(sample)
将返回给定缓冲区中包含的数据包数。mFramesPerPacket
的{{1}}属性控制。对于线性PCM缓冲区,每个样本缓冲区的总大小为AudioStreamBasicDescription
。对于压缩缓冲区(如AAC),总大小和帧数之间存在 no 关系。frames * bytes per frame
此回调用于检索更多线性PCM音频数据以进行编码。您必须必须至少提供AudioConverterComplexInputDataProc
指定的数据包数。由于我一直在使用转换器进行实时推送式编码,因此我需要确保每次数据推送都包含最少量的数据包。像这样的东西(伪代码):
ioNumberDataPackets
let minimumPackets = outputFramesPerPacket / inputFramesPerPacket
var buffers: [CMSampleBuffer] = []
while getTotalSize(buffers) < minimumPackets {
buffers = buffers + [getNextBuffer()]
}
AudioConverterFillComplexBuffer(...)
&#39; 如果它们包含多个缓冲区,您实际上可以对CMSampleBuffer
进行切片。执行此操作的工具是CMSampleBuffer
。这很好,因此您可以为CMSampleBufferCopySampleBufferForRange
提供它要求的精确数据包,这样可以更轻松地处理生成的编码缓冲区的计时信息。因为如果您在预期AudioConverterComplexInputDataProc
时提供转换器1500
数据帧,则结果样本缓冲区的持续时间为1024
而不是1024/sampleRate
。
进行AAC编码时,必须设置修剪持续时间,如下所示:
1500/sampleRate
我做错的一件事是我在编码时间添加了修剪持续时间。这应该由您的编写器处理,以便它可以保证信息被添加到您的主要音频帧。
此外,CMSetAttachment(buffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault),
kCMAttachmentMode_ShouldNotPropagate)
的值永远不会大于样本缓冲区的持续时间。启动的一个例子:
kCMSampleBufferAttachmentKey_TrimDurationAtStart
2112
44100
2112 / 44100 = ~0.0479s
,启动持续时间:1024
1024 / 44100
,启动持续时间:1024
1088 / 41100
CMSampleBuffer
有一个可选的AudioConverterFillComplexBuffer
。 你应该使用它。它将指向包含样本大小信息的新数据包描述数组。您需要此样本大小信息来构造新的压缩样本缓冲区:
outputPacketDescriptionsPtr