我在弄清楚如何使用Apple的硬件加速视频框架来解压缩H.264视频流时遇到了很多麻烦。几个星期后,我想出来,想分享一个广泛的例子,因为我找不到一个。
我的目标是提供WWDC '14 session 513中介绍的Video Toolbox的全面,有启发性的示例。我的代码将无法编译或运行,因为它需要与基本H.264流集成(如从文件读取视频或从在线等流式传输),并且需要根据具体情况进行调整。
我应该提一下,除了在Google上搜索主题时所学到的内容,我对视频/解码的经验很少。我不知道有关视频格式,参数结构等的所有细节,所以我只包括我认为你需要知道的内容。
我正在使用XCode 6.2并已部署到运行iOS 8.1和8.2的iOS设备。
答案 0 :(得分:159)
NALUs: NALU只是一个长度不一的数据块,有一个NALU起始码头0x00 00 00 01 YY
,YY
的前5位告诉你什么类型的这是NALU,因此标题后面是什么类型的数据。 (由于您只需要前5位,我使用YY & 0x1F
来获取相关位。)我列出了方法NSString * const naluTypesStrings[]
中所有这些类型的内容,但您不需要知道他们都是什么。
参数:您的解码器需要参数,因此它知道如何存储H.264视频数据。您需要设置的2是序列参数集(SPS)和图片参数集(PPS),它们每个都有自己的NALU类型编号。您不需要知道参数的含义,解码器知道如何处理它们。
H.264流格式:在大多数H.264流中,您将收到初始的PPS和SPS参数集,后跟i帧(也称为IDR帧或刷新帧)NALU。然后你将收到几个P帧NALU(可能是几十个左右),然后是另一组参数(可能与初始参数相同)和一个i帧,更多P帧等.i帧远大于P帧。从概念上讲,您可以将i帧视为视频的整个图像,而P帧只是对i帧所做的更改,直到您收到下一个i帧为止。
从H.264流生成单独的NALU。我无法显示此步骤的代码,因为它很大程度上依赖于您正在使用的视频源。我制作了这张图片以显示我正在使用的内容("图中的数据""框架"在我的下面的代码中),但您的情况可能也可能会有所不同。 每次收到一个框架(receivedRawVideoFrame:
)时都会调用我的方法uint8_t *frame
,这是两种类型之一。在图中,这2种帧类型是2个大的紫色框。
使用CMVideoFormatDescriptionCreateFromH264ParameterSets()从SPS和PPS NALU创建CMVideoFormatDescriptionRef。如果不先执行此操作,则无法显示任何帧。 SPS和PPS可能看起来像混杂的数字,但VTD知道如何处理它们。您需要知道的是,CMVideoFormatDescriptionRef
是对视频数据的描述,例如宽度/高度,格式类型(kCMPixelFormat_32BGRA
,kCMVideoCodecType_H264
等),宽高比,色彩空间等。您的解码器将保留参数,直到新的设置到达(有时参数会定期重新发送,即使它们没有改变)。
根据" AVCC"重新打包您的IDR和非IDR帧NALU;格式。这意味着删除NALU起始码并用一个4字节的标题替换它们,该标题表示NALU的长度。您不需要为SPS和PPS NALU执行此操作。 (注意,4字节NALU长度标头是big-endian,所以如果你有一个UInt32
值,它必须进行字节交换,然后才能使用CMBlockBuffer
复制到CFSwapInt32
。使用htonl
函数调用在我的代码中执行此操作。)
将IDR和非IDR NALU帧打包到CMBlockBuffer中。不要使用SPS PPS参数NALU执行此操作。您需要了解的关于CMBlockBuffers
的所有信息是它们是一种在核心媒体中包装任意数据块的方法。 (视频管道中的任何压缩视频数据都包含在其中。)
将CMBlockBuffer打包到CMSampleBuffer中。您需要了解的CMSampleBuffers
是他们将CMBlockBuffers
包含其他信息(这里是如果使用CMVideoFormatDescription
,则CMTime
和CMTime
。
创建一个VTDecompressionSessionRef并将示例缓冲区提供给VTDecompressionSessionDecodeFrame()。或者,您可以使用AVSampleBufferDisplayLayer
及其enqueueSampleBuffer:
方法,并且您赢得了'需要使用VTDecompSession。它设置起来比较简单,但如果像VTD那样出现问题就不会抛出错误。
在VTDecompSession回调中,使用生成的CVImageBufferRef显示视频帧。如果您需要将CVImageBuffer
转换为UIImage
,请参阅我的StackOverflow回答here。
H.264流可能会有很大差异。根据我的经验, NALU起始代码标题有时为3字节(0x00 00 01
),有时为4 (0x00 00 00 01
)。我的代码适用于4个字节;如果您正在使用3,则需要更改一些内容。
如果您想了解有关NALU的更多信息,我发现this answer非常有帮助。就我而言,我发现我并不需要忽略"仿真预防"所描述的字节,所以我个人跳过了这一步,但你可能需要知道这一点。
如果 VTDecompressionSession输出错误编号(如-12909),请在XCode项目中查找错误代码。在项目导航器中找到VideoToolbox框架,打开它并找到标题VTErrors.h。如果您无法找到,我还在其他答案中包含了以下所有错误代码。
因此,让我们首先声明一些全局变量并包括VT框架(VT = Video Toolbox)。
#import <VideoToolbox/VideoToolbox.h>
@property (nonatomic, assign) CMVideoFormatDescriptionRef formatDesc;
@property (nonatomic, assign) VTDecompressionSessionRef decompressionSession;
@property (nonatomic, retain) AVSampleBufferDisplayLayer *videoLayer;
@property (nonatomic, assign) int spsSize;
@property (nonatomic, assign) int ppsSize;
仅使用以下数组,以便您可以打印出您正在接收的NALU帧类型。如果您知道所有这些类型的含义,对您有好处,您对H.264的了解比我更多:)我的代码只处理类型1,5,7和8。
NSString * const naluTypesStrings[] =
{
@"0: Unspecified (non-VCL)",
@"1: Coded slice of a non-IDR picture (VCL)", // P frame
@"2: Coded slice data partition A (VCL)",
@"3: Coded slice data partition B (VCL)",
@"4: Coded slice data partition C (VCL)",
@"5: Coded slice of an IDR picture (VCL)", // I frame
@"6: Supplemental enhancement information (SEI) (non-VCL)",
@"7: Sequence parameter set (non-VCL)", // SPS parameter
@"8: Picture parameter set (non-VCL)", // PPS parameter
@"9: Access unit delimiter (non-VCL)",
@"10: End of sequence (non-VCL)",
@"11: End of stream (non-VCL)",
@"12: Filler data (non-VCL)",
@"13: Sequence parameter set extension (non-VCL)",
@"14: Prefix NAL unit (non-VCL)",
@"15: Subset sequence parameter set (non-VCL)",
@"16: Reserved (non-VCL)",
@"17: Reserved (non-VCL)",
@"18: Reserved (non-VCL)",
@"19: Coded slice of an auxiliary coded picture without partitioning (non-VCL)",
@"20: Coded slice extension (non-VCL)",
@"21: Coded slice extension for depth view components (non-VCL)",
@"22: Reserved (non-VCL)",
@"23: Reserved (non-VCL)",
@"24: STAP-A Single-time aggregation packet (non-VCL)",
@"25: STAP-B Single-time aggregation packet (non-VCL)",
@"26: MTAP16 Multi-time aggregation packet (non-VCL)",
@"27: MTAP24 Multi-time aggregation packet (non-VCL)",
@"28: FU-A Fragmentation unit (non-VCL)",
@"29: FU-B Fragmentation unit (non-VCL)",
@"30: Unspecified (non-VCL)",
@"31: Unspecified (non-VCL)",
};
现在这就是所有魔法发生的地方。
-(void) receivedRawVideoFrame:(uint8_t *)frame withSize:(uint32_t)frameSize isIFrame:(int)isIFrame
{
OSStatus status;
uint8_t *data = NULL;
uint8_t *pps = NULL;
uint8_t *sps = NULL;
// I know what my H.264 data source's NALUs look like so I know start code index is always 0.
// if you don't know where it starts, you can use a for loop similar to how i find the 2nd and 3rd start codes
int startCodeIndex = 0;
int secondStartCodeIndex = 0;
int thirdStartCodeIndex = 0;
long blockLength = 0;
CMSampleBufferRef sampleBuffer = NULL;
CMBlockBufferRef blockBuffer = NULL;
int nalu_type = (frame[startCodeIndex + 4] & 0x1F);
NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]);
// if we havent already set up our format description with our SPS PPS parameters, we
// can't process any frames except type 7 that has our parameters
if (nalu_type != 7 && _formatDesc == NULL)
{
NSLog(@"Video error: Frame is not an I Frame and format description is null");
return;
}
// NALU type 7 is the SPS parameter NALU
if (nalu_type == 7)
{
// find where the second PPS start code begins, (the 0x00 00 00 01 code)
// from which we also get the length of the first SPS code
for (int i = startCodeIndex + 4; i < startCodeIndex + 40; i++)
{
if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01)
{
secondStartCodeIndex = i;
_spsSize = secondStartCodeIndex; // includes the header in the size
break;
}
}
// find what the second NALU type is
nalu_type = (frame[secondStartCodeIndex + 4] & 0x1F);
NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]);
}
// type 8 is the PPS parameter NALU
if(nalu_type == 8)
{
// find where the NALU after this one starts so we know how long the PPS parameter is
for (int i = _spsSize + 4; i < _spsSize + 30; i++)
{
if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01)
{
thirdStartCodeIndex = i;
_ppsSize = thirdStartCodeIndex - _spsSize;
break;
}
}
// allocate enough data to fit the SPS and PPS parameters into our data objects.
// VTD doesn't want you to include the start code header (4 bytes long) so we add the - 4 here
sps = malloc(_spsSize - 4);
pps = malloc(_ppsSize - 4);
// copy in the actual sps and pps values, again ignoring the 4 byte header
memcpy (sps, &frame[4], _spsSize-4);
memcpy (pps, &frame[_spsSize+4], _ppsSize-4);
// now we set our H264 parameters
uint8_t* parameterSetPointers[2] = {sps, pps};
size_t parameterSetSizes[2] = {_spsSize-4, _ppsSize-4};
// suggestion from @Kris Dude's answer below
if (_formatDesc)
{
CFRelease(_formatDesc);
_formatDesc = NULL;
}
status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault, 2,
(const uint8_t *const*)parameterSetPointers,
parameterSetSizes, 4,
&_formatDesc);
NSLog(@"\t\t Creation of CMVideoFormatDescription: %@", (status == noErr) ? @"successful!" : @"failed...");
if(status != noErr) NSLog(@"\t\t Format Description ERROR type: %d", (int)status);
// See if decomp session can convert from previous format description
// to the new one, if not we need to remake the decomp session.
// This snippet was not necessary for my applications but it could be for yours
/*BOOL needNewDecompSession = (VTDecompressionSessionCanAcceptFormatDescription(_decompressionSession, _formatDesc) == NO);
if(needNewDecompSession)
{
[self createDecompSession];
}*/
// now lets handle the IDR frame that (should) come after the parameter sets
// I say "should" because that's how I expect my H264 stream to work, YMMV
nalu_type = (frame[thirdStartCodeIndex + 4] & 0x1F);
NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]);
}
// create our VTDecompressionSession. This isnt neccessary if you choose to use AVSampleBufferDisplayLayer
if((status == noErr) && (_decompressionSession == NULL))
{
[self createDecompSession];
}
// type 5 is an IDR frame NALU. The SPS and PPS NALUs should always be followed by an IDR (or IFrame) NALU, as far as I know
if(nalu_type == 5)
{
// find the offset, or where the SPS and PPS NALUs end and the IDR frame NALU begins
int offset = _spsSize + _ppsSize;
blockLength = frameSize - offset;
data = malloc(blockLength);
data = memcpy(data, &frame[offset], blockLength);
// replace the start code header on this NALU with its size.
// AVCC format requires that you do this.
// htonl converts the unsigned int from host to network byte order
uint32_t dataLength32 = htonl (blockLength - 4);
memcpy (data, &dataLength32, sizeof (uint32_t));
// create a block buffer from the IDR NALU
status = CMBlockBufferCreateWithMemoryBlock(NULL, data, // memoryBlock to hold buffered data
blockLength, // block length of the mem block in bytes.
kCFAllocatorNull, NULL,
0, // offsetToData
blockLength, // dataLength of relevant bytes, starting at offsetToData
0, &blockBuffer);
NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed...");
}
// NALU type 1 is non-IDR (or PFrame) picture
if (nalu_type == 1)
{
// non-IDR frames do not have an offset due to SPS and PSS, so the approach
// is similar to the IDR frames just without the offset
blockLength = frameSize;
data = malloc(blockLength);
data = memcpy(data, &frame[0], blockLength);
// again, replace the start header with the size of the NALU
uint32_t dataLength32 = htonl (blockLength - 4);
memcpy (data, &dataLength32, sizeof (uint32_t));
status = CMBlockBufferCreateWithMemoryBlock(NULL, data, // memoryBlock to hold data. If NULL, block will be alloc when needed
blockLength, // overall length of the mem block in bytes
kCFAllocatorNull, NULL,
0, // offsetToData
blockLength, // dataLength of relevant data bytes, starting at offsetToData
0, &blockBuffer);
NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed...");
}
// now create our sample buffer from the block buffer,
if(status == noErr)
{
// here I'm not bothering with any timing specifics since in my case we displayed all frames immediately
const size_t sampleSize = blockLength;
status = CMSampleBufferCreate(kCFAllocatorDefault,
blockBuffer, true, NULL, NULL,
_formatDesc, 1, 0, NULL, 1,
&sampleSize, &sampleBuffer);
NSLog(@"\t\t SampleBufferCreate: \t %@", (status == noErr) ? @"successful!" : @"failed...");
}
if(status == noErr)
{
// set some values of the sample buffer's attachments
CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES);
CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0);
CFDictionarySetValue(dict, kCMSampleAttachmentKey_DisplayImmediately, kCFBooleanTrue);
// either send the samplebuffer to a VTDecompressionSession or to an AVSampleBufferDisplayLayer
[self render:sampleBuffer];
}
// free memory to avoid a memory leak, do the same for sps, pps and blockbuffer
if (NULL != data)
{
free (data);
data = NULL;
}
}
以下方法创建您的VTD会话。每当您收到 new 参数时重新创建它。 (您不必重新创建每次时间接收参数,非常确定。)
如果您要为目标CVPixelBuffer
设置属性,请阅读CoreVideo PixelBufferAttributes values并将其放入NSDictionary *destinationImageBufferAttributes
。
-(void) createDecompSession
{
// make sure to destroy the old VTD session
_decompressionSession = NULL;
VTDecompressionOutputCallbackRecord callBackRecord;
callBackRecord.decompressionOutputCallback = decompressionSessionDecodeFrameCallback;
// this is necessary if you need to make calls to Objective C "self" from within in the callback method.
callBackRecord.decompressionOutputRefCon = (__bridge void *)self;
// you can set some desired attributes for the destination pixel buffer. I didn't use this but you may
// if you need to set some attributes, be sure to uncomment the dictionary in VTDecompressionSessionCreate
NSDictionary *destinationImageBufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithBool:YES],
(id)kCVPixelBufferOpenGLESCompatibilityKey,
nil];
OSStatus status = VTDecompressionSessionCreate(NULL, _formatDesc, NULL,
NULL, // (__bridge CFDictionaryRef)(destinationImageBufferAttributes)
&callBackRecord, &_decompressionSession);
NSLog(@"Video Decompression Session Create: \t %@", (status == noErr) ? @"successful!" : @"failed...");
if(status != noErr) NSLog(@"\t\t VTD ERROR type: %d", (int)status);
}
现在,每当VTD解压缩您发送给它的任何帧时,都会调用此方法。即使出现错误或帧被丢弃,也会调用此方法。
void decompressionSessionDecodeFrameCallback(void *decompressionOutputRefCon,
void *sourceFrameRefCon,
OSStatus status,
VTDecodeInfoFlags infoFlags,
CVImageBufferRef imageBuffer,
CMTime presentationTimeStamp,
CMTime presentationDuration)
{
THISCLASSNAME *streamManager = (__bridge THISCLASSNAME *)decompressionOutputRefCon;
if (status != noErr)
{
NSError *error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil];
NSLog(@"Decompressed error: %@", error);
}
else
{
NSLog(@"Decompressed sucessfully");
// do something with your resulting CVImageBufferRef that is your decompressed frame
[streamManager displayDecodedFrame:imageBuffer];
}
}
这是我们实际将sampleBuffer发送到要解码的VTD的地方。
- (void) render:(CMSampleBufferRef)sampleBuffer
{
VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression;
VTDecodeInfoFlags flagOut;
NSDate* currentTime = [NSDate date];
VTDecompressionSessionDecodeFrame(_decompressionSession, sampleBuffer, flags,
(void*)CFBridgingRetain(currentTime), &flagOut);
CFRelease(sampleBuffer);
// if you're using AVSampleBufferDisplayLayer, you only need to use this line of code
// [videoLayer enqueueSampleBuffer:sampleBuffer];
}
如果你正在使用AVSampleBufferDisplayLayer
,请务必在viewDidLoad或其他一些init方法中初始化这样的图层。
-(void) viewDidLoad
{
// create our AVSampleBufferDisplayLayer and add it to the view
videoLayer = [[AVSampleBufferDisplayLayer alloc] init];
videoLayer.frame = self.view.frame;
videoLayer.bounds = self.view.bounds;
videoLayer.videoGravity = AVLayerVideoGravityResizeAspect;
// set Timebase, you may need this if you need to display frames at specific times
// I didn't need it so I haven't verified that the timebase is working
CMTimebaseRef controlTimebase;
CMTimebaseCreateWithMasterClock(CFAllocatorGetDefault(), CMClockGetHostTimeClock(), &controlTimebase);
//videoLayer.controlTimebase = controlTimebase;
CMTimebaseSetTime(self.videoLayer.controlTimebase, kCMTimeZero);
CMTimebaseSetRate(self.videoLayer.controlTimebase, 1.0);
[[self.view layer] addSublayer:videoLayer];
}
答案 1 :(得分:18)
如果你在框架中找不到VTD错误代码,我决定在这里包含它们。 (同样,所有这些错误和更多错误都可以在项目导航器VideoToolbox.framework
中的VTErrors.h
内找到。)
如果您做错了什么,您将在VTD解码帧回调中或在创建VTD会话时获得其中一个错误代码。
kVTPropertyNotSupportedErr = -12900,
kVTPropertyReadOnlyErr = -12901,
kVTParameterErr = -12902,
kVTInvalidSessionErr = -12903,
kVTAllocationFailedErr = -12904,
kVTPixelTransferNotSupportedErr = -12905, // c.f. -8961
kVTCouldNotFindVideoDecoderErr = -12906,
kVTCouldNotCreateInstanceErr = -12907,
kVTCouldNotFindVideoEncoderErr = -12908,
kVTVideoDecoderBadDataErr = -12909, // c.f. -8969
kVTVideoDecoderUnsupportedDataFormatErr = -12910, // c.f. -8970
kVTVideoDecoderMalfunctionErr = -12911, // c.f. -8960
kVTVideoEncoderMalfunctionErr = -12912,
kVTVideoDecoderNotAvailableNowErr = -12913,
kVTImageRotationNotSupportedErr = -12914,
kVTVideoEncoderNotAvailableNowErr = -12915,
kVTFormatDescriptionChangeNotSupportedErr = -12916,
kVTInsufficientSourceColorDataErr = -12917,
kVTCouldNotCreateColorCorrectionDataErr = -12918,
kVTColorSyncTransformConvertFailedErr = -12919,
kVTVideoDecoderAuthorizationErr = -12210,
kVTVideoEncoderAuthorizationErr = -12211,
kVTColorCorrectionPixelTransferFailedErr = -12212,
kVTMultiPassStorageIdentifierMismatchErr = -12213,
kVTMultiPassStorageInvalidErr = -12214,
kVTFrameSiloInvalidTimeStampErr = -12215,
kVTFrameSiloInvalidTimeRangeErr = -12216,
kVTCouldNotFindTemporalFilterErr = -12217,
kVTPixelTransferNotPermittedErr = -12218,
答案 2 :(得分:10)
在Josh Baker的Avios库中可以找到一个很好的Swift示例:https://github.com/tidwall/Avios
请注意,Avios目前希望用户在NAL开始代码处理分块数据,但确实会处理从那一点开始的数据解码。
另外值得一看的是基于Swift的RTMP库HaishinKit(以前称为“LF”),它有自己的解码实现,包括更强大的NALU解析:https://github.com/shogo4405/lf.swift
答案 3 :(得分:4)
除了上面的VTErrors之外,我认为值得添加您在尝试Livy时可能会遇到的CMFormatDescription,CMBlockBuffer,CMSampleBuffer错误。
kCMFormatDescriptionError_InvalidParameter = -12710,
kCMFormatDescriptionError_AllocationFailed = -12711,
kCMFormatDescriptionError_ValueNotAvailable = -12718,
kCMBlockBufferNoErr = 0,
kCMBlockBufferStructureAllocationFailedErr = -12700,
kCMBlockBufferBlockAllocationFailedErr = -12701,
kCMBlockBufferBadCustomBlockSourceErr = -12702,
kCMBlockBufferBadOffsetParameterErr = -12703,
kCMBlockBufferBadLengthParameterErr = -12704,
kCMBlockBufferBadPointerParameterErr = -12705,
kCMBlockBufferEmptyBBufErr = -12706,
kCMBlockBufferUnallocatedBlockErr = -12707,
kCMBlockBufferInsufficientSpaceErr = -12708,
kCMSampleBufferError_AllocationFailed = -12730,
kCMSampleBufferError_RequiredParameterMissing = -12731,
kCMSampleBufferError_AlreadyHasDataBuffer = -12732,
kCMSampleBufferError_BufferNotReady = -12733,
kCMSampleBufferError_SampleIndexOutOfRange = -12734,
kCMSampleBufferError_BufferHasNoSampleSizes = -12735,
kCMSampleBufferError_BufferHasNoSampleTimingInfo = -12736,
kCMSampleBufferError_ArrayTooSmall = -12737,
kCMSampleBufferError_InvalidEntryCount = -12738,
kCMSampleBufferError_CannotSubdivide = -12739,
kCMSampleBufferError_SampleTimingInfoInvalid = -12740,
kCMSampleBufferError_InvalidMediaTypeForOperation = -12741,
kCMSampleBufferError_InvalidSampleData = -12742,
kCMSampleBufferError_InvalidMediaFormat = -12743,
kCMSampleBufferError_Invalidated = -12744,
kCMSampleBufferError_DataFailed = -16750,
kCMSampleBufferError_DataCanceled = -16751,
答案 4 :(得分:2)
感谢 Olivia 撰写的这篇精彩而详细的帖子! 我最近开始使用 Xamarin 表单在 iPad Pro 上编写流媒体应用程序,这篇文章帮助很大,我在整个网络上找到了很多关于它的参考。
我想很多人已经在 Xamarin 中重新编写了 Olivia 的示例,我并不声称自己是世界上最好的程序员。但是由于这里还没有人发布 C#/Xamarin 版本,我想为上面的精彩帖子回馈社区,这是我的 C#/Xamarin 版本。也许它可以帮助某人加快她或他的项目的进度。
我一直关注 Olivia 的例子,我什至保留了她的大部分评论。
首先,因为我更喜欢处理枚举而不是数字,所以我声明了这个 NALU 枚举。 为了完整起见,我还添加了一些我在互联网上找到的“异国情调”NALU 类型:
public enum NALUnitType : byte
{
NALU_TYPE_UNKNOWN = 0,
NALU_TYPE_SLICE = 1,
NALU_TYPE_DPA = 2,
NALU_TYPE_DPB = 3,
NALU_TYPE_DPC = 4,
NALU_TYPE_IDR = 5,
NALU_TYPE_SEI = 6,
NALU_TYPE_SPS = 7,
NALU_TYPE_PPS = 8,
NALU_TYPE_AUD = 9,
NALU_TYPE_EOSEQ = 10,
NALU_TYPE_EOSTREAM = 11,
NALU_TYPE_FILL = 12,
NALU_TYPE_13 = 13,
NALU_TYPE_14 = 14,
NALU_TYPE_15 = 15,
NALU_TYPE_16 = 16,
NALU_TYPE_17 = 17,
NALU_TYPE_18 = 18,
NALU_TYPE_19 = 19,
NALU_TYPE_20 = 20,
NALU_TYPE_21 = 21,
NALU_TYPE_22 = 22,
NALU_TYPE_23 = 23,
NALU_TYPE_STAP_A = 24,
NALU_TYPE_STAP_B = 25,
NALU_TYPE_MTAP16 = 26,
NALU_TYPE_MTAP24 = 27,
NALU_TYPE_FU_A = 28,
NALU_TYPE_FU_B = 29,
}
或多或少为方便起见,我还为 NALU 描述定义了一个额外的字典:
public static Dictionary<NALUnitType, string> GetDescription { get; } =
new Dictionary<NALUnitType, string>()
{
{ NALUnitType.NALU_TYPE_UNKNOWN, "Unspecified (non-VCL)" },
{ NALUnitType.NALU_TYPE_SLICE, "Coded slice of a non-IDR picture (VCL) [P-frame]" },
{ NALUnitType.NALU_TYPE_DPA, "Coded slice data partition A (VCL)" },
{ NALUnitType.NALU_TYPE_DPB, "Coded slice data partition B (VCL)" },
{ NALUnitType.NALU_TYPE_DPC, "Coded slice data partition C (VCL)" },
{ NALUnitType.NALU_TYPE_IDR, "Coded slice of an IDR picture (VCL) [I-frame]" },
{ NALUnitType.NALU_TYPE_SEI, "Supplemental Enhancement Information [SEI] (non-VCL)" },
{ NALUnitType.NALU_TYPE_SPS, "Sequence Parameter Set [SPS] (non-VCL)" },
{ NALUnitType.NALU_TYPE_PPS, "Picture Parameter Set [PPS] (non-VCL)" },
{ NALUnitType.NALU_TYPE_AUD, "Access Unit Delimiter [AUD] (non-VCL)" },
{ NALUnitType.NALU_TYPE_EOSEQ, "End of Sequence (non-VCL)" },
{ NALUnitType.NALU_TYPE_EOSTREAM, "End of Stream (non-VCL)" },
{ NALUnitType.NALU_TYPE_FILL, "Filler data (non-VCL)" },
{ NALUnitType.NALU_TYPE_13, "Sequence Parameter Set Extension (non-VCL)" },
{ NALUnitType.NALU_TYPE_14, "Prefix NAL Unit (non-VCL)" },
{ NALUnitType.NALU_TYPE_15, "Subset Sequence Parameter Set (non-VCL)" },
{ NALUnitType.NALU_TYPE_16, "Reserved (non-VCL)" },
{ NALUnitType.NALU_TYPE_17, "Reserved (non-VCL)" },
{ NALUnitType.NALU_TYPE_18, "Reserved (non-VCL)" },
{ NALUnitType.NALU_TYPE_19, "Coded slice of an auxiliary coded picture without partitioning (non-VCL)" },
{ NALUnitType.NALU_TYPE_20, "Coded Slice Extension (non-VCL)" },
{ NALUnitType.NALU_TYPE_21, "Coded Slice Extension for Depth View Components (non-VCL)" },
{ NALUnitType.NALU_TYPE_22, "Reserved (non-VCL)" },
{ NALUnitType.NALU_TYPE_23, "Reserved (non-VCL)" },
{ NALUnitType.NALU_TYPE_STAP_A, "STAP-A Single-time Aggregation Packet (non-VCL)" },
{ NALUnitType.NALU_TYPE_STAP_B, "STAP-B Single-time Aggregation Packet (non-VCL)" },
{ NALUnitType.NALU_TYPE_MTAP16, "MTAP16 Multi-time Aggregation Packet (non-VCL)" },
{ NALUnitType.NALU_TYPE_MTAP24, "MTAP24 Multi-time Aggregation Packet (non-VCL)" },
{ NALUnitType.NALU_TYPE_FU_A, "FU-A Fragmentation Unit (non-VCL)" },
{ NALUnitType.NALU_TYPE_FU_B, "FU-B Fragmentation Unit (non-VCL)" }
};
这里是我的主要解码程序。我假设接收到的帧是原始字节数组:
public void Decode(byte[] frame)
{
uint frameSize = (uint)frame.Length;
SendDebugMessage($"Received frame of {frameSize} bytes.");
// I know how my H.264 data source's NALUs looks like so I know start code index is always 0.
// if you don't know where it starts, you can use a for loop similar to how I find the 2nd and 3rd start codes
uint firstStartCodeIndex = 0;
uint secondStartCodeIndex = 0;
uint thirdStartCodeIndex = 0;
// length of NALU start code in bytes.
// for h.264 the start code is 4 bytes and looks like this: 0 x 00 00 00 01
const uint naluHeaderLength = 4;
// check the first 8bits after the NALU start code, mask out bits 0-2, the NALU type ID is in bits 3-7
uint startNaluIndex = firstStartCodeIndex + naluHeaderLength;
byte startByte = frame[startNaluIndex];
int naluTypeId = startByte & 0x1F; // 0001 1111
NALUnitType naluType = (NALUnitType)naluTypeId;
SendDebugMessage($"1st Start Code Index: {firstStartCodeIndex}");
SendDebugMessage($"1st NALU Type: '{NALUnit.GetDescription[naluType]}' ({(int)naluType})");
// bits 1 and 2 are the NRI
int nalRefIdc = startByte & 0x60; // 0110 0000
SendDebugMessage($"1st NRI (NAL Ref Idc): {nalRefIdc}");
// IF the very first NALU type is an IDR -> handle it like a slice frame (-> re-cast it to type 1 [Slice])
if (naluType == NALUnitType.NALU_TYPE_IDR)
{
naluType = NALUnitType.NALU_TYPE_SLICE;
}
// if we haven't already set up our format description with our SPS PPS parameters,
// we can't process any frames except type 7 that has our parameters
if (naluType != NALUnitType.NALU_TYPE_SPS && this.FormatDescription == null)
{
SendDebugMessage("Video Error: Frame is not an I-Frame and format description is null.");
return;
}
// NALU type 7 is the SPS parameter NALU
if (naluType == NALUnitType.NALU_TYPE_SPS)
{
// find where the second PPS 4byte start code begins (0x00 00 00 01)
// from which we also get the length of the first SPS code
for (uint i = firstStartCodeIndex + naluHeaderLength; i < firstStartCodeIndex + 40; i++)
{
if (frame[i] == 0x00 && frame[i + 1] == 0x00 && frame[i + 2] == 0x00 && frame[i + 3] == 0x01)
{
secondStartCodeIndex = i;
this.SpsSize = secondStartCodeIndex; // includes the header in the size
SendDebugMessage($"2nd Start Code Index: {secondStartCodeIndex} -> SPS Size: {this.SpsSize}");
break;
}
}
// find what the second NALU type is
startByte = frame[secondStartCodeIndex + naluHeaderLength];
naluType = (NALUnitType)(startByte & 0x1F);
SendDebugMessage($"2nd NALU Type: '{NALUnit.GetDescription[naluType]}' ({(int)naluType})");
// bits 1 and 2 are the NRI
nalRefIdc = startByte & 0x60; // 0110 0000
SendDebugMessage($"2nd NRI (NAL Ref Idc): {nalRefIdc}");
}
// type 8 is the PPS parameter NALU
if (naluType == NALUnitType.NALU_TYPE_PPS)
{
// find where the NALU after this one starts so we know how long the PPS parameter is
for (uint i = this.SpsSize + naluHeaderLength; i < this.SpsSize + 30; i++)
{
if (frame[i] == 0x00 && frame[i + 1] == 0x00 && frame[i + 2] == 0x00 && frame[i + 3] == 0x01)
{
thirdStartCodeIndex = i;
this.PpsSize = thirdStartCodeIndex - this.SpsSize;
SendDebugMessage($"3rd Start Code Index: {thirdStartCodeIndex} -> PPS Size: {this.PpsSize}");
break;
}
}
// allocate enough data to fit the SPS and PPS parameters into our data objects.
// VTD doesn't want you to include the start code header (4 bytes long) so we subtract 4 here
byte[] sps = new byte[this.SpsSize - naluHeaderLength];
byte[] pps = new byte[this.PpsSize - naluHeaderLength];
// copy in the actual sps and pps values, again ignoring the 4 byte header
Array.Copy(frame, naluHeaderLength, sps, 0, sps.Length);
Array.Copy(frame, this.SpsSize + naluHeaderLength, pps,0, pps.Length);
// create video format description
List<byte[]> parameterSets = new List<byte[]> { sps, pps };
this.FormatDescription = CMVideoFormatDescription.FromH264ParameterSets(parameterSets, (int)naluHeaderLength, out CMFormatDescriptionError formatDescriptionError);
SendDebugMessage($"Creation of CMVideoFormatDescription: {((formatDescriptionError == CMFormatDescriptionError.None)? $"Successful! (Video Codec = {this.FormatDescription.VideoCodecType}, Dimension = {this.FormatDescription.Dimensions.Height} x {this.FormatDescription.Dimensions.Width}px, Type = {this.FormatDescription.MediaType})" : $"Failed ({formatDescriptionError})")}");
// re-create the decompression session whenever new PPS data was received
this.DecompressionSession = this.CreateDecompressionSession(this.FormatDescription);
// now lets handle the IDR frame that (should) come after the parameter sets
// I say "should" because that's how I expect my H264 stream to work, YMMV
startByte = frame[thirdStartCodeIndex + naluHeaderLength];
naluType = (NALUnitType)(startByte & 0x1F);
SendDebugMessage($"3rd NALU Type: '{NALUnit.GetDescription[naluType]}' ({(int)naluType})");
// bits 1 and 2 are the NRI
nalRefIdc = startByte & 0x60; // 0110 0000
SendDebugMessage($"3rd NRI (NAL Ref Idc): {nalRefIdc}");
}
// type 5 is an IDR frame NALU.
// The SPS and PPS NALUs should always be followed by an IDR (or IFrame) NALU, as far as I know.
if (naluType == NALUnitType.NALU_TYPE_IDR || naluType == NALUnitType.NALU_TYPE_SLICE)
{
// find the offset or where IDR frame NALU begins (after the SPS and PPS NALUs end)
uint offset = (naluType == NALUnitType.NALU_TYPE_SLICE)? 0 : this.SpsSize + this.PpsSize;
uint blockLength = frameSize - offset;
SendDebugMessage($"Block Length (NALU type '{naluType}'): {blockLength}");
var blockData = new byte[blockLength];
Array.Copy(frame, offset, blockData, 0, blockLength);
// write the size of the block length (IDR picture data) at the beginning of the IDR block.
// this means we replace the start code header (0 x 00 00 00 01) of the IDR NALU with the block size.
// AVCC format requires that you do this.
// This next block is very specific to my application and wasn't in Olivia's example:
// For my stream is encoded by NVIDEA NVEC I had to deal with additional 3-byte start codes within my IDR/SLICE frame.
// These start codes must be replaced by 4 byte start codes adding the block length as big endian.
// ======================================================================================================================================================
// find all 3 byte start code indices (0x00 00 01) within the block data (including the first 4 bytes of NALU header)
uint startCodeLength = 3;
List<uint> foundStartCodeIndices = new List<uint>();
for (uint i = 0; i < blockData.Length; i++)
{
if (blockData[i] == 0x00 && blockData[i + 1] == 0x00 && blockData[i + 2] == 0x01)
{
foundStartCodeIndices.Add(i);
byte naluByte = blockData[i + startCodeLength];
var tmpNaluType = (NALUnitType)(naluByte & 0x1F);
SendDebugMessage($"3-Byte Start Code (0x000001) found at index: {i} (NALU type {(int)tmpNaluType} '{NALUnit.GetDescription[tmpNaluType]}'");
}
}
// determine the byte length of each slice
uint totalLength = 0;
List<uint> sliceLengths = new List<uint>();
for (int i = 0; i < foundStartCodeIndices.Count; i++)
{
// for convenience only
bool isLastValue = (i == foundStartCodeIndices.Count-1);
// start-index to bit right after the start code
uint startIndex = foundStartCodeIndices[i] + startCodeLength;
// set end-index to bit right before beginning of next start code or end of frame
uint endIndex = isLastValue ? (uint) blockData.Length : foundStartCodeIndices[i + 1];
// now determine slice length including NALU header
uint sliceLength = (endIndex - startIndex) + naluHeaderLength;
// add length to list
sliceLengths.Add(sliceLength);
// sum up total length of all slices (including NALU header)
totalLength += sliceLength;
}
// Arrange slices like this:
// [4byte slice1 size][slice1 data][4byte slice2 size][slice2 data]...[4byte slice4 size][slice4 data]
// Replace 3-Byte Start Code with 4-Byte start code, then replace the 4-Byte start codes with the length of the following data block (big endian).
// https://stackoverflow.com/questions/65576349/nvidia-nvenc-media-foundation-encoded-h-264-frames-not-decoded-properly-using
byte[] finalBuffer = new byte[totalLength];
uint destinationIndex = 0;
// create a buffer for each slice and append it to the final block buffer
for (int i = 0; i < sliceLengths.Count; i++)
{
// create byte vector of size of current slice, add additional bytes for NALU start code length
byte[] sliceData = new byte[sliceLengths[i]];
// now copy the data of current slice into the byte vector,
// start reading data after the 3-byte start code
// start writing data after NALU start code,
uint sourceIndex = foundStartCodeIndices[i] + startCodeLength;
long dataLength = sliceLengths[i] - naluHeaderLength;
Array.Copy(blockData, sourceIndex, sliceData, naluHeaderLength, dataLength);
// replace the NALU start code with data length as big endian
byte[] sliceLengthInBytes = BitConverter.GetBytes(sliceLengths[i] - naluHeaderLength);
Array.Reverse(sliceLengthInBytes);
Array.Copy(sliceLengthInBytes, 0, sliceData, 0, naluHeaderLength);
// add the slice data to final buffer
Array.Copy(sliceData, 0, finalBuffer, destinationIndex, sliceData.Length);
destinationIndex += sliceLengths[i];
}
// ======================================================================================================================================================
// from here we are back on track with Olivia's code:
// now create block buffer from final byte[] buffer
CMBlockBufferFlags flags = CMBlockBufferFlags.AssureMemoryNow | CMBlockBufferFlags.AlwaysCopyData;
var finalBlockBuffer = CMBlockBuffer.FromMemoryBlock(finalBuffer, 0, flags, out CMBlockBufferError blockBufferError);
SendDebugMessage($"Creation of Final Block Buffer: {(blockBufferError == CMBlockBufferError.None ? "Successful!" : $"Failed ({blockBufferError})")}");
if (blockBufferError != CMBlockBufferError.None) return;
// now create the sample buffer
nuint[] sampleSizeArray = new nuint[] { totalLength };
CMSampleBuffer sampleBuffer = CMSampleBuffer.CreateReady(finalBlockBuffer, this.FormatDescription, 1, null, sampleSizeArray, out CMSampleBufferError sampleBufferError);
SendDebugMessage($"Creation of Final Sample Buffer: {(sampleBufferError == CMSampleBufferError.None ? "Successful!" : $"Failed ({sampleBufferError})")}");
if (sampleBufferError != CMSampleBufferError.None) return;
// if sample buffer was successfully created -> pass sample to decoder
// set sample attachments
CMSampleBufferAttachmentSettings[] attachments = sampleBuffer.GetSampleAttachments(true);
var attachmentSetting = attachments[0];
attachmentSetting.DisplayImmediately = true;
// enable async decoding
VTDecodeFrameFlags decodeFrameFlags = VTDecodeFrameFlags.EnableAsynchronousDecompression;
// add time stamp
var currentTime = DateTime.Now;
var currentTimePtr = new IntPtr(currentTime.Ticks);
// send the sample buffer to a VTDecompressionSession
var result = DecompressionSession.DecodeFrame(sampleBuffer, decodeFrameFlags, currentTimePtr, out VTDecodeInfoFlags decodeInfoFlags);
if (result == VTStatus.Ok)
{
SendDebugMessage($"Executing DecodeFrame(..): Successful! (Info: {decodeInfoFlags})");
}
else
{
NSError error = new NSError(CFErrorDomain.OSStatus, (int)result);
SendDebugMessage($"Executing DecodeFrame(..): Failed ({(VtStatusEx)result} [0x{(int)result:X8}] - {error}) - Info: {decodeInfoFlags}");
}
}
}
我创建解压会话的函数如下所示:
private VTDecompressionSession CreateDecompressionSession(CMVideoFormatDescription formatDescription)
{
VTDecompressionSession.VTDecompressionOutputCallback callBackRecord = this.DecompressionSessionDecodeFrameCallback;
VTVideoDecoderSpecification decoderSpecification = new VTVideoDecoderSpecification
{
EnableHardwareAcceleratedVideoDecoder = true
};
CVPixelBufferAttributes destinationImageBufferAttributes = new CVPixelBufferAttributes();
try
{
var decompressionSession = VTDecompressionSession.Create(callBackRecord, formatDescription, decoderSpecification, destinationImageBufferAttributes);
SendDebugMessage("Video Decompression Session Creation: Successful!");
return decompressionSession;
}
catch (Exception e)
{
SendDebugMessage($"Video Decompression Session Creation: Failed ({e.Message})");
return null;
}
}
解压会话回调例程:
private void DecompressionSessionDecodeFrameCallback(
IntPtr sourceFrame,
VTStatus status,
VTDecodeInfoFlags infoFlags,
CVImageBuffer imageBuffer,
CMTime presentationTimeStamp,
CMTime presentationDuration)
{
if (status != VTStatus.Ok)
{
NSError error = new NSError(CFErrorDomain.OSStatus, (int)status);
SendDebugMessage($"Decompression: Failed ({(VtStatusEx)status} [0x{(int)status:X8}] - {error})");
}
else
{
SendDebugMessage("Decompression: Successful!");
try
{
var image = GetImageFromImageBuffer(imageBuffer);
// In my application I do not use a display layer but send the decoded image directly by an event:
ImageSource imgSource = ImageSource.FromStream(() => image.AsPNG().AsStream());
OnImageFrameReady?.Invoke(imgSource);
}
catch (Exception e)
{
SendDebugMessage(e.ToString());
}
}
}
我使用这个函数将 CVImageBuffer 转换为 UIImage。它还指的是上面提到的 Olivia 的一篇帖子 (how to convert a CVImageBufferRef to UIImage):
private UIImage GetImageFromImageBuffer(CVImageBuffer imageBuffer)
{
if (!(imageBuffer is CVPixelBuffer pixelBuffer)) return null;
var ciImage = CIImage.FromImageBuffer(pixelBuffer);
var temporaryContext = new CIContext();
var rect = CGRect.FromLTRB(0, 0, pixelBuffer.Width, pixelBuffer.Height);
CGImage cgImage = temporaryContext.CreateCGImage(ciImage, rect);
if (cgImage == null) return null;
var uiImage = UIImage.FromImage(cgImage);
cgImage.Dispose();
return uiImage;
}
最后但并非最不重要的一点是,我用于调试输出的小功能,请随意根据您的目的对它进行拉皮条 ;-)
private void SendDebugMessage(string msg)
{
Debug.WriteLine($"VideoDecoder (iOS) - {msg}");
}
最后,让我们看看上面代码使用的命名空间:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Net;
using AvcLibrary;
using CoreFoundation;
using CoreGraphics;
using CoreImage;
using CoreMedia;
using CoreVideo;
using Foundation;
using UIKit;
using VideoToolbox;
using Xamarin.Forms;
答案 5 :(得分:1)
@Livy删除{
"first": "bob"
}
public a: A;
this.a = b;
之前的内存泄漏,你应该添加以下内容:
CMVideoFormatDescriptionCreateFromH264ParameterSets