我试图用ICU进行语言检测。我的代码适用于ISO-8859-1(丹麦语,荷兰语,英语,法语,德语,意大利语,挪威语,葡萄牙语,瑞典语)。但我不知道如何发现韩语,日语......语言。你能帮助我吗 ?我的代码:
void languageDetection(char *str)
{
UCharsetDetector *csd;
const UCharsetMatch *csm;
const char *name;
const char *lang;
int32_t len = strlen(str);
int32_t confidence;
UErrorCode status = U_ZERO_ERROR;
csd = ucsdet_open(&status);
if(status != U_ZERO_ERROR)
printf("ERROR: in ucsdet_open :%s\n", u_errorName(status));
ucsdet_setText(csd, str, -1, &status);
if(status != U_ZERO_ERROR)
printf("ERROR: in ucsdet_setText :%s\n", u_errorName(status));
csm = ucsdet_detect(csd, &status);
if(status != U_ZERO_ERROR)
printf("ERROR: in ucsdet_detectAll :%s\n", u_errorName(status));
name = ucsdet_getName(csm, &status);
if(status != U_ZERO_ERROR)
printf("ERROR: in ucsdet_getName :%s\n", u_errorName(status));
lang = ucsdet_getLanguage(csm, &status);
if(status != U_ZERO_ERROR)
printf("ERROR: in ucsdet_getLanguage :%s\n", u_errorName(status));
confidence = ucsdet_getConfidence(csm, &status);
if(status != U_ZERO_ERROR)
printf("ERROR: in ucsdet_getConfidence :%s\n", u_errorName(status));
printf("%s (%s) %i\n", name, lang, confidence);
ucsdet_close(csd);
}
非常感谢