我们假设我可以拥有以下字符串:
"hey @john..."
"@john, hello"
"@john(hello)"
我正在对字符串进行标记,以便用空格分隔每个单词:
[myString componentsSeparatedByString:@" "];
我的令牌数组现在包含:
@john...
@john,
@john(hello)
我正在检查标点符号如下:
NSRange textRange = [words rangeOfString:@","];
if(textRange.location != NSNotFound){ } //do something
对于这些情况。如何确保只有@john被标记化,同时保留尾随字符:
...
,
(hello)
注意:我希望能够处理字符串末尾的所有字符。以上只是3个例子。
答案 0 :(得分:1)
请参阅NSString的-rangeOfString:options:range: ...给它一系列{ [myString length] - [searchString length], [searchString length] }
并查看结果范围的位置是否等于NSNotFound
。有关区分大小写的信息,请参阅文档中的NSStringCompareOptions
选项
答案 1 :(得分:0)
您可以使用NSScanner
和NSCharacterSet
来执行此操作。 NSScanner
可以扫描字符串直到集合中第一次出现的字符。如果您获得+alphaNumericCharacterSet
,然后在其上调用-invertedSet
,您将获得一组所有非字母数字字符。
这可能不是超级有效但它会起作用:
NSArray* strings = [NSArray arrayWithObjects:
@"hey @john...",
@"@john, hello",
@"@john(hello)",
nil];
//get the characters we want to skip, which is everything except letters and numbers
NSCharacterSet* illegalChars = [[NSCharacterSet alphanumericCharacterSet] invertedSet];
for(NSString* currentString in strings)
{
//this stores the tokens for the current string
NSMutableArray* tokens = [NSMutableArray array];
//split the string into unparsed tokens
NSArray* split = [currentString componentsSeparatedByString:@" "];
for(NSString* currentToken in split)
{
//we only want tokens that start with an @ symbol
if([currentToken hasPrefix:@"@"])
{
NSString* token = nil;
//start a scanner from the first character after the @ symbol
NSScanner* scanner = [NSScanner scannerWithString:[currentToken substringFromIndex:1]];
//keep scanning until we hit an illegal character
[scanner scanUpToCharactersFromSet:illegalChars intoString:&token];
//get the rest of the string
NSString* suffix = [currentToken substringFromIndex:[scanner scanLocation] + 1];
if(token)
{
//store the token in a dictionary
NSDictionary* tokenDict = [NSDictionary dictionaryWithObjectsAndKeys:
[@"@" stringByAppendingString:token], @"token", //prepend the @ symbol that we skipped
suffix, @"suffix",
nil];
[tokens addObject:tokenDict];
}
}
}
//output
for(NSDictionary* dict in tokens)
{
NSLog(@"Found token: %@ additional characters: %@",[dict objectForKey:@"token"],[dict objectForKey:@"suffix"]);
}
}
答案 2 :(得分:0)
您确定CFStringTokenizer或its new Snow-Leopard-only Cocoa equivalent不是更合适吗?
正如您所发现的那样,分割空间是一种非常天真的标记方式。关于真正的人类语言词汇规则,CFStringTokenizer和enumerateSubstrings…
更加智能。