将文本拆分为单词,数字和标点符号

时间:2012-01-15 13:35:35

标签: objective-c cocoa

我需要将短语分为单词,数字,标点符号和空格/制表符。我也想保留事情的顺序。

NSString *text = [NSString stringWithFormat:@"The 3 quick:\"brown fox, jump's\" over."];

这是我需要提供的那种清单:

['The', ' ', '3', ' ', 'quick, ':', '"', 'brown', ' ', 'fox', ',', ' ', 'jump's', ' ', '.']

谢谢!

1 个答案:

答案 0 :(得分:2)

尝试使用NSScanner&编写的这个类别。 NSCharacterSet

@interface NSString(Splitting)

-(NSArray *) arrayBySeparatingComponentsInCharacterSet:(NSCharacterSet *) charSet;

@end

@implementation NSString(Splitting)

BOOL scanOneCharacterFromSetIntoString(NSScanner *self, NSCharacterSet * charSet, NSString **outStr);
BOOL scanOneCharacterFromSetIntoString(NSScanner *self, NSCharacterSet * charSet, NSString **outStr)
{
    // check for index out of bounds
    NSString *inStr = self.string;

    if (self.scanLocation >= inStr.length)
    {
        return NO;
    }

    unichar ch = [inStr characterAtIndex:self.scanLocation];

    if (![charSet characterIsMember:ch])
    {
        return NO;
    }

    self.scanLocation++;
    if (outStr)
    {
        *outStr = [NSString stringWithCharacters:&ch length:1];
    }

    return YES;
}

-(NSArray *) arrayBySeparatingComponentsInCharacterSet:(NSCharacterSet *)charSet
{
    NSScanner *scanner = [NSScanner scannerWithString:self];
    NSMutableArray *result = [NSMutableArray array];

    NSString *temp = nil;
    while ([scanner scanUpToCharactersFromSet:charSet intoString:&temp] || scanOneCharacterFromSetIntoString(scanner, charSet, &temp)) {;
        [result addObject:temp];

        if ([scanner scanLocation] >= [self length])
        {
            break;
        }

        unichar temp2 = [self characterAtIndex:[scanner scanLocation]];

        if ([charSet characterIsMember:temp2])
        {
            [result addObject:[NSString stringWithFormat:@"%c", temp2]];
            // only update the scan location if the scan was sucessful
            [scanner setScanLocation:[scanner scanLocation] + 1];
        }
    }

    return result;
}

@end

int main (int argc, const char * argv[])
{
    @autoreleasepool {

        NSString *str = @"The 3 quick:\"brown fox, jump's\" over.";
        NSArray *array = [str arrayBySeparatingComponentsInCharacterSet:[NSCharacterSet characterSetWithCharactersInString:@" :\",'."]];
        NSLog(@"%@", array);
    }
}

应该是您需要的,只需将字符集更改为您需要的字符集即可。另请注意,这是在启用ARC的情况下编译的,因此在引用计数环境中,它可能会或可能无法与内存管理一起正常工作。