解析文本,但保持标点符号

时间:2013-11-29 09:19:14

标签: ios parsing nsstring nscharacterset

我正在使用这段小代码来浏览一些输入文本并提取由标记分隔的句子:

NSCharacterSet *punctuation =
[NSCharacterSet characterSetWithCharactersInString:@".!?\n"];
NSArray *parts = [data componentsSeparatedByCharactersInSet:punctuation];

问题是生成的数组会从标点符号中删除。如何使用适当的标点符号存储数据?如果可能的话,我希望保留用换行符(\ n)标记的句子。

例如,如果我输入:

  

这是一句话。它以一段时期为标志。这个句子没有标记一个怎么办?我做得很好!

我想得到这样的东西:

  

这是一句话   它以一段时间为标志   这句话没有标注一句   你好吗?
  我做得很好!

4 个答案:

答案 0 :(得分:1)

您可以改为使用[NSString stringByReplacingOccurencesOfString: ByString:]

例如[NSString stringByReplacingOccurencesOfString:@"." ByString:@".\n"];

和其他符号类似。

答案 1 :(得分:1)

希望这会有所帮助:

NSString *string = @"This is a sentence. It is marked by a period. This sentence is not marked by one How do you do? I'm doing very good!";
NSError *error = nil;
NSString *pattern = @"(\\.|,|!|\\?|\\n)\\s*";
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:pattern
                                                                            options:0
                                                                              error:&error];
if (expression)
{
    NSArray *matches = [expression matchesInString:string
                                           options:0
                                             range:NSMakeRange(0, [string length])];
    NSLog(@"%@", matches);
    if ([matches count] > 0)
    {
        NSMutableArray *sentences = [[NSMutableArray alloc] initWithCapacity:[matches count]];
        NSUInteger sentenceStart = 0;
        for (NSTextCheckingResult *result in matches)
        {
            NSUInteger sentenceEnd = result.range.location + 1;
            [sentences addObject:[string substringWithRange:NSMakeRange(sentenceStart, sentenceEnd - sentenceStart)]];
            sentenceStart = sentenceEnd + (result.range.length - 1);
        }
        NSLog(@"%@", sentences);
    }
}
else
{
    NSLog(@"ERROR: %@", error);
}

答案 2 :(得分:1)

Yogi的答案将用于插入换行符。但是,如果要将字符串部分放在数组中,可以使用此解决方法:

data = [data stringByReplacingOccurrencesOfString:@"." withString:@".&§"];
data = [data stringByReplacingOccurrencesOfString:@"!" withString:@"!&§"];
data = [data stringByReplacingOccurrencesOfString:@"?" withString:@"?&§"];
NSArray *parts = [data componentsSeparatedByString:@"&§"];

答案 3 :(得分:0)

    NSString *yourString = @"This is a sentence. It is marked by a period. This sentence is not marked by one How do you do? I'm doing very good!";
    NSMutableCharacterSet *punctuation = [NSMutableCharacterSet characterSetWithCharactersInString:@".!?\n"];
   [punctuation formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];
    NSMutableArray *words = [[yourString componentsSeparatedByCharactersInSet:punctuation] mutableCopy];

希望这会对你有所帮助......