Question

为了在字符串（haystack）中搜索另一个子字符串（needle）并获取子字符串的所有范围，我将haystack加载到NSData对象中，然后获取NSData对于针头串并使用rangeOfData:options:range:在大海捞针中搜索针头。

// Get the data for the contents of the file, store error
NSError *error;
NSData *fileData = [NSData dataWithContentsOfFile:filePath options:0 error:&error];
// Check for error
if (error) {
    // Handle it...
}
NSData *needleData = [needle dataUsingEncoding:NSUTF8StringEncoding];

NSRange searchRange = NSMakeRange(0, fileData.length);
while (searchRange.location < fileData.length) {
    NSRange needleRange = [fileData rangeOfData:needleData options:0 range:searchRange];
    if (needleRange != NSNotFound) {
        // Found one, use the range...
    } else {
        // Otherwise there are no more to be found, bail out
        break;
    }
}

通常使用rangeOfData:找到的针范围与haystack字符串中的针字符串范围相同但是这假设每个字符都是1个字节，但是某些Unicode字符不是2个（或更多）字节，例如✔和✘。这导致数据中针的范围与字符串中针的范围不同。

无论如何要准确地从数据范围中获取字符串的范围，还是应该使用不同的算法？我测试了一些用于搜索字符串本身的方法，这种方法最快（与使用rangeOfString:，NSRegularExpression，KMP，Boyer-Moore和Boyer-Moore-Horspool相比。

Answer 1

（来自我上面的评论:) 使用NSData将haystack和needle string转换为NSUTF32BigEndianStringEncoding。然后每个字符在数据blob中恰好占用4个字节。

Answer 2

使用指针算法尝试strstr(3)。在strchr(3)的帮助下，您将能够大规模并行化这一点。

通过NSData搜索字符串

2 个答案: