如何处理iOS检测NSString语言,英文无法准确检测?

时间:2013-01-09 08:27:38

标签: ios nsstring detect

方式:

- (NSString *)languageForString:(NSString *) text{
    return (__bridge NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)[text cStringUsingEncoding:NSUnicodeStringEncoding], CFRangeMake(0, MIN(text.length,100)));
}

使用:

NSLog(@"\"%@\" language is %@",@"Tokenizer",[self languageForString:@"tokenizer"]);
NSLog(@"\"%@\" language is %@",@"Tokenizer detect",[self languageForString:@"Tokenizer detect"]);
NSLog(@"\"%@\" language is %@",@"detect",[self languageForString:@"detect"]);
NSLog(@"\"%@\" language is %@",@"我们",[self languageForString:@"我们"]);
NSLog(@"\"%@\" language is %@",@"집안일",[self languageForString:@"집안일"]);
NSLog(@"\"%@\" language is %@",@"Démocratie",[self languageForString:@"Démocratie"]);
NSLog(@"\"%@\" language is %@",@"Tokenizer English",[self languageForString:@"Tokenizer English"]);
NSLog(@"\"%@\" language is %@",@"ここはデパートです",[self languageForString:@"ここはデパートです"]);

输出:

2013-01-09 16:12:28.582 TestCommandLine[6478:c07] "Tokenizer" language is tr<br/>
2013-01-09 16:12:28.586 TestCommandLine[6478:c07] "Tokenizer detect" language is tr<br/>
2013-01-09 16:12:28.586 TestCommandLine[6478:c07] "detect" language is cs<br/>
2013-01-09 16:12:28.587 TestCommandLine[6478:c07] "我们" language is zh-Hans<br/>
2013-01-09 16:12:28.560 TestCommandLine[6478:c07] "집안일" language is ko<br/>
2013-01-09 16:12:28.577 TestCommandLine[6478:c07] "Démocratie" language is fr<br/>
2013-01-09 16:12:28.590 TestCommandLine[6478:c07] "Tokenizer English" language is en<br/>
2013-01-09 16:12:28.591 TestCommandLine[6478:c07] "ここはデパートです" language is ja<br/>

如何成为:

2013-01-09 16:12:28.582 TestCommandLine[6478:c07] "Tokenizer" language is en<br/>
2013-01-09 16:12:28.586 TestCommandLine[6478:c07] "Tokenizer detect" language is en<br/>
2013-01-09 16:12:28.586 TestCommandLine[6478:c07] "detect" language is en<br/>
2013-01-09 16:12:28.587 TestCommandLine[6478:c07] "我们" language is zh-Hans<br/>
2013-01-09 16:12:28.560 TestCommandLine[6478:c07] "집안일" language is ko<br/>
2013-01-09 16:12:28.577 TestCommandLine[6478:c07] "Démocratie" language is fr<br/>
2013-01-09 16:12:28.590 TestCommandLine[6478:c07] "Tokenizer English" language is en<br/>
2013-01-09 16:12:28.591 TestCommandLine[6478:c07] "ここはデパートです" language is ja<br/>

2 个答案:

答案 0 :(得分:1)

这是我的解决方案

- (NSString *)detectLanguage {

    if ([self isEmpty]) {
        return nil;
    }

    NSString *string = nil;

    // You can set a larger detect number here
    if (self.length > 30) {
        string = self;
    } else {
        NSMutableString *tempString = [NSMutableString stringWithString:self];

        while (tempString.length < 30) {
            [tempString appendFormat:@" %@",self];
        }

        string = tempString;
    }

    NSArray *tagschemes = [NSArray arrayWithObjects:NSLinguisticTagSchemeLanguage, nil];
    NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options:0];
    [tagger setString:string];
    NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];

    if (![language isEqualToString:@"und"]) {
        return language;
    }

    return (__bridge NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)string, CFRangeMake(0, MIN(string.length,400)));
}

答案 1 :(得分:0)

你不能这样认识......至少没有任何不错的准确性。 你必须提供更长的字符串。

CFStringTokenizerCopyBestStringLanguage文档说它需要至少200-400

- &GT;没有更好的方法,我们也尝试使用我们自己的解决方案,它需要更多的文本来提高准确性