试图拆分一个非常大的字符串

时间:2011-09-15 12:25:46

标签: objective-c ios nsstring

  

可能重复:
  Most memory efficient way to split an NSString in to substrings

我正在尝试拆分一个20Mb的字符串。我尝试过使用componentsSeparatedByString,但它占用的内存太多了。我认为这是因为它分割了字符串,但也保留了原始字符串的完整性。这意味着字符串有效地存储在内存中两次(即使我在拆分后立即释放原始字符串仍然是一个问题。)

我对Objective C很新。我试图编写一些代码来删除原始字符串中的子字符串,因为它将它添加到找到的字符串数组中。这个想法是,随着找到的字符串的可变数组变大,原始字符串变小。唯一的问题是它泄漏了内存和崩溃。如果有人能告诉我我做错了什么,那么你就会很棒!

    NSRange range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];
    int counter = 1;

    // locations will == int max if it can't find any more occurances
    while (range.location < [mainHtml length]) {
        NSString *curStr;
        NSRange curStrRange;

        NSRange rangeToSearchIn = NSMakeRange(range.location+1, [mainHtml length] - range.location - 1);
        NSRange nextRange = [mainHtml rangeOfString:@"<p class=NumberedParagraph>" options:NSCaseInsensitiveSearch range:rangeToSearchIn];

        if (nextRange.location > [mainHtml length])
        {
            // This is the last string - get everything up to the end of the file
            curStrRange = NSMakeRange(0, [mainHtml length]);
            curStr = [mainHtml substringFromIndex:range.location];
        } else {
            curStrRange = NSMakeRange(range.location, nextRange.location - range.location);
            curStr = [mainHtml substringWithRange:curStrRange];
        }

        // Remove the substring just processed from the orignal string
        // * it crashes here, normally on the 3rd itteration
        mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];
        range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];

        [self.parts addObject:curStr];
    }

2 个答案:

答案 0 :(得分:2)

我认为@babbidi有正确的想法。 mainHtml很大,你有很多自动释放的副本(每次迭代一个副本),没有被释放。尝试在代码中添加以下@autorelease以在每个循环结束时释放所有自动释放的对象。如果您没有使用Mac OS X 10.7,那么您只需要在主循环外部手动创建自动释放池,并在每次迭代时将其耗尽一次。

NSRange range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];
int counter = 1;

// locations will == int max if it can't find any more occurances
while (range.location < [mainHtml length]) {
    @autorelease {
        NSString *curStr;
        NSRange curStrRange;

        NSRange rangeToSearchIn = NSMakeRange(range.location+1, [mainHtml length] - range.location - 1);
        NSRange nextRange = [mainHtml rangeOfString:@"<p class=NumberedParagraph>" options:NSCaseInsensitiveSearch range:rangeToSearchIn];

        if (nextRange.location > [mainHtml length])
        {
            // This is the last string - get everything up to the end of the file
            curStrRange = NSMakeRange(0, [mainHtml length]);
            curStr = [mainHtml substringFromIndex:range.location];
        } else {
            curStrRange = NSMakeRange(range.location, nextRange.location - range.location);
            curStr = [mainHtml substringWithRange:curStrRange];
        }

        // Remove the substring just processed from the orignal string
        // * it crashes here, normally on the 3rd itteration
        mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];
        range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];

        [self.parts addObject:curStr];
    }
}

答案 1 :(得分:1)

我不相信你有任何泄漏。 substringFromIndex:返回一个自动释放的字符串,因此它可能会在内存中保留多次迭代。您可以创建自己的substringFromIndex:方法(例如:createSubstringFromIndex),该方法将返回一个字符串保留字符串,您可以手动释放该字符串。

+(NSString *)createSubstringFromIndex:(NSUInteger)index string:(NSString *)string{
    int newLen = [string length] - index;
    if(newLen<=0)
        return @"";   // or nil
    char *cStr = malloc(newLen+1);
    for(int i=index; i<[string length]; i++){
        cStr[i-index]=[string characterAtIndex:i];
    }
    cStr[newLen]='\0';
    NSString *retStr = [[NSString alloc] initWithCString:cStr encoding:NSASCIIStringEncoding];
    free(cStr);
    return retStr;
}

在您的代码中,您必须替换它:

mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];

用这个:

NSString *newHtmlString = [[self class] createSubstringFromIndex:curStrRange.location + curStrRange.length string:mainHtml];
[mainHtml release];                ///mainHtml should be retained before the while loop starts
mainHtml = newHtmlString;