Question

我们假设我可以拥有以下字符串：

"hey @john..."
"@john, hello"
"@john(hello)"

我正在对字符串进行标记，以便用空格分隔每个单词：

[myString componentsSeparatedByString:@" "];

我的令牌数组现在包含：

@john...
@john,
@john(hello)

我正在检查标点符号如下：

NSRange textRange = [words rangeOfString:@","];
if(textRange.location != NSNotFound){ } //do something

对于这些情况。如何确保只有@john被标记化，同时保留尾随字符：

...
,
(hello)

注意：我希望能够处理字符串末尾的所有字符。以上只是3个例子。

Answer 1

请参阅NSString的-rangeOfString:options:range: ...给它一系列{ [myString length] - [searchString length], [searchString length] }并查看结果范围的位置是否等于NSNotFound。有关区分大小写的信息，请参阅文档中的NSStringCompareOptions选项

Answer 2

您可以使用NSScanner和NSCharacterSet来执行此操作。 NSScanner可以扫描字符串直到集合中第一次出现的字符。如果您获得+alphaNumericCharacterSet，然后在其上调用-invertedSet，您将获得一组所有非字母数字字符。

这可能不是超级有效但它会起作用：

NSArray* strings = [NSArray arrayWithObjects:
                    @"hey @john...",
                    @"@john, hello",
                    @"@john(hello)",
                    nil];

//get the characters we want to skip, which is everything except letters and numbers
NSCharacterSet* illegalChars = [[NSCharacterSet alphanumericCharacterSet] invertedSet];


for(NSString* currentString in strings)
{
    //this stores the tokens for the current string
    NSMutableArray* tokens = [NSMutableArray array];

    //split the string into unparsed tokens
    NSArray* split = [currentString componentsSeparatedByString:@" "];

    for(NSString* currentToken in split)
    {
        //we only want tokens that start with an @ symbol
        if([currentToken hasPrefix:@"@"])
        {
            NSString* token = nil;

            //start a scanner from the first character after the @ symbol
            NSScanner* scanner = [NSScanner scannerWithString:[currentToken substringFromIndex:1]];
            //keep scanning until we hit an illegal character
            [scanner scanUpToCharactersFromSet:illegalChars intoString:&token];

            //get the rest of the string
            NSString* suffix = [currentToken substringFromIndex:[scanner scanLocation] + 1];

            if(token)
            {
                //store the token in a dictionary
                NSDictionary* tokenDict = [NSDictionary dictionaryWithObjectsAndKeys:
                                           [@"@" stringByAppendingString:token], @"token", //prepend the @ symbol that we skipped
                                           suffix, @"suffix",
                                           nil];
                [tokens addObject:tokenDict];
            }
        }
    }
    //output
    for(NSDictionary* dict in tokens)
    {
        NSLog(@"Found token: %@ additional characters: %@",[dict objectForKey:@"token"],[dict objectForKey:@"suffix"]);
    }
}

Answer 3

您确定CFStringTokenizer或its new Snow-Leopard-only Cocoa equivalent不是更合适吗？

正如您所发现的那样，分割空间是一种非常天真的标记方式。关于真正的人类语言词汇规则，CFStringTokenizer和enumerateSubstrings…更加智能。

如何检测某些字符是否在NSString的末尾？

3 个答案: