Question

我在一个iPad应用程序上工作，该应用程序具有一个紧密循环使用Web服务和Core Data的同步过程。为了根据Apple's Recomendation减少内存占用，我会定期分配和排空NSAutoreleasePool。这当前效果很好，当前应用程序没有内存问题。但是，我打算转移到ARC，NSAutoreleasePool不再有效，并希望保持同样的性能。我创建了一些示例并对它们进行了计时，并且我想知道什么是最好的方法，使用ARC来实现相同的性能并保持代码可读性。

出于测试目的，我想出了3个场景，每个场景使用1到10,000,000之间的数字创建一个字符串。我运行了每个例子3次，以确定他们使用带有Apple LLVM 3.0编译器（w / o gdb -O0）和XCode 4.2的Mac 64位应用程序花了多长时间。我还通过仪器运行每个示例，以大致了解内存峰值。

以下每个示例都包含在以下代码块中：

int main (int argc, const char * argv[])
{
    @autoreleasepool {
        NSDate *now = [NSDate date];

        //Code Example ...

        NSTimeInterval interval = [now timeIntervalSinceNow];
        printf("Duration: %f\n", interval);
    }
}

NSAutoreleasePool批次[原始预ARC]（峰值记忆：~116 KB）

    static const NSUInteger BATCH_SIZE = 1500;
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
    {
        NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
        [text class];

        if((count + 1) % BATCH_SIZE == 0)
        {
            [pool drain];
            pool = [[NSAutoreleasePool alloc] init];
        }
    }
    [pool drain];

运行时间
   10.928158
  10.912849
  11.084716

Outer @autoreleasepool（峰值记忆：~382 MB）

    @autoreleasepool {
        for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }

运行时间：
   11.489350
  11.310462
  11.344662

内部@autoreleasepool（峰值记忆：~61.2KB）

    for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
    {
        @autoreleasepool {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }

运行时间：
   14.031112
  14.284014
  14.099625

@autoreleasepool w / goto（峰值记忆：~115KB）

    static const NSUInteger BATCH_SIZE = 1500;
    uint32_t count = 0;

    next_batch:
    @autoreleasepool {
        for(;count < MAX_ALLOCATIONS; count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
            if((count + 1) % BATCH_SIZE == 0)
            {
                count++; //Increment count manually
                goto next_batch;
            }
        }
    }

运行时间：
   10.908756
  10.960189
  11.018382

goto语句提供了最接近的效果，但它使用了goto。有什么想法吗？

更新

注意：goto语句是documentation中所述的@autoreleasepool的正常退出，不会泄漏内存。

在输入时，会推送自动释放池。在正常出口（休息，返回，goto，fall-through等）弹出自动释放池。为了与现有代码兼容，如果退出是由于异常，不会弹出自动释放池。

Answer 1

以下内容应与goto没有goto的答案相同：

for (NSUInteger count = 0; count < MAX_ALLOCATIONS;)
{
    @autoreleasepool
    {
        for (NSUInteger j = 0; j < BATCH_SIZE && count < MAX_ALLOCATIONS; j++, count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }
}

Answer 2

请注意，ARC支持在-O0未启用的重要优化。如果您要测量ARC下的性能，则必须在启用优化的情况下进行测试。否则，您将根据ARC的“天真模式”测量手动调整的保留/释放位置。

使用优化再次运行测试，看看会发生什么。

更新：我很好奇，所以我自己动手了。这些是发布模式（-Os）中的运行时结果，具有7,000,000个分配。

arc-perf[43645:f803] outer: 8.1259
arc-perf[43645:f803] outer: 8.2089
arc-perf[43645:f803] outer: 9.1104

arc-perf[43645:f803] inner: 8.4817
arc-perf[43645:f803] inner: 8.3687
arc-perf[43645:f803] inner: 8.5470

arc-perf[43645:f803] withGoto: 7.6133
arc-perf[43645:f803] withGoto: 7.7465
arc-perf[43645:f803] withGoto: 7.7007

arc-perf[43645:f803] non-ARC: 7.3443
arc-perf[43645:f803] non-ARC: 7.3188
arc-perf[43645:f803] non-ARC: 7.3098

并且内存达到峰值（仅运行100,000次分配，因为仪器将永远占用）：

Outer: 2.55 MB
Inner: 723 KB
withGoto: ~747 KB
Non-ARC: ~748 KB

这些结果让我感到有些惊讶。那么，记忆峰值结果不会;这正是你所期待的。但是inner和withGoto之间的运行时间差异，即使启用了优化，也会高于我预期的值。

当然，这在某种程度上是一种病态微观测试，它不太可能模拟任何应用程序的真实性能。这里要说的是ARC可能确实有一些开销，但你应该在做出假设之前先测量你的实际应用。

（另外，我使用嵌套for循环测试@ ipmcc的答案;它的行为几乎与goto版本完全相同。）

使用@autoreleasepool减少峰值内存使用量

2 个答案: