将带有重音字符的NSString转换为CString

时间:2011-09-08 21:16:49

标签: ios encoding nsstring

我有一个值为Jose的NSString(e上的重音)。我尝试将其转换为C字符串,如下所示:

char str [[myAccentStr length] + 1];
[myAccentStr getCString:str maxLength:[myAccentStr length] + 1 encoding:NSUTF32StringEncoding];

但str最终成为一个空字符串。是什么赋予了?我也尝试过UTF8和UTF16。它会在稍后传递给另一个函数,当该函数调用lstrlen时,它的大小为零。

2 个答案:

答案 0 :(得分:1)

NSString getCString:maxLength:encoding的文档说:

  

您可以使用canBeConvertedToEncoding:来检查字符串是否可以   无损转换为编码。如果不能,你可以使用   dataUsingEncoding:allowLossyConversion:获取C字符串   使用编码表示,允许一些信息丢失(注意   dataUsingEncoding返回的数据:allowLossyConversion:是   不是严格的C字符串,因为它没有NULL终结符。)

使用NSString方法dataUsingEncoding:allowLossyConversion:可以解决问题。这是一个代码示例:

NSString *myAccentStr = @"José";
char str[[myAccentStr length] + 1];

// NSString * to C String (char*)
NSData *strData = [myAccentStr dataUsingEncoding:NSMacOSRomanStringEncoding 
                                allowLossyConversion:YES];
memcpy(str, [strData bytes], [strData length] + 1);
str[[myAccentStr length]] = '\0';
NSLog(@"str (from NSString* to c string): %s", str);

// C String (char*) to NSString *   
NSString *newAccentStr = [NSString stringWithCString:str 
                                            encoding:NSMacOSRomanStringEncoding];
NSLog(@"newAccentStr (from c string to NSString*):  %@", newAccentStr);

该NSLog的输出是:

  

str(从NSString *到c string):José

     

newAccentStr(从c string到NSString *):José

到目前为止,我只是在使用NSMacOSRomanStringEncoding时才能正常工作。


修改

将此更改为社区Wiki。请随时编辑。

hooleyhoop有一些很好的观点,所以我想我会尝试制作尽可能详细的代码。如果我遗失任何东西,请有人加入。

另外 - 不确定为什么[NSString canBeConvertedToEncoding:]返回YES,即使[NSString getCString:maxLength:encoding:]函数肯定不能正常工作(如输出所示)。

这里有一些代码可以帮助分析哪些有效/无效:

// Define Block variable to tests out different encodings
void (^tryGetCStringUsingEncoding)(NSString*, NSStringEncoding) = ^(NSString* originalNSString, NSStringEncoding encoding) {
    NSLog(@"Trying to convert \"%@\" using encoding: 0x%X", originalNSString, encoding);
    BOOL canEncode = [originalNSString canBeConvertedToEncoding:encoding];
    if (!canEncode)
    {
        NSLog(@"    Can not encode \"%@\" using encoding %X", originalNSString, encoding);
    }
    else
    {
        // Try encoding using NSString getCString:maxLength:encoding:
        NSUInteger cStrLength = [originalNSString lengthOfBytesUsingEncoding:encoding];
        char cstr[cStrLength];
        [originalNSString getCString:cstr maxLength:cStrLength encoding:encoding];
        NSLog(@"    Converted(1): \"%s\"  (expected length: %u)",
              cstr, cStrLength);

        // Try encoding using NSString dataUsingEncoding:allowLossyConversion:          
        NSData *strData = [originalNSString dataUsingEncoding:encoding allowLossyConversion:YES];
        char cstr2[[strData length] + 1];
        memcpy(cstr2, [strData bytes], [strData length] + 1);
        cstr2[[strData length]] = '\0';
        NSLog(@"    Converted(2): \"%s\"  (expected length: %u)",
              cstr2, [strData length]);
    }
};

NSString *myAccentStr = @"José";

// Try out whatever encoding you want
tryGetCStringUsingEncoding(myAccentStr, NSUTF8StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF16StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF32StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSMacOSRomanStringEncoding);

结果:

> Trying to convert "José" using encoding: 0x4
>     Converted(1): ""  (expected length: 5)
>     Converted(2): "José"  (expected length: 5)
> Trying to convert "José" using encoding: 0xA
>     Converted(1): ""  (expected length: 8)
>     Converted(2): "ˇ˛J"  (expected length: 10)
> Trying to convert "José" using encoding: 0x8C000100
>     Converted(1): ""  (expected length: 16)
>     Converted(2): "ˇ˛"  (expected length: 20)
> Trying to convert "José" using encoding: 0x1E
>     Converted(1): "-"  (expected length: 4)
>     Converted(2): "José"  (expected length: 4)

答案 1 :(得分:1)

[aString length]返回字符数。在您的情况下,这是 4

您可以使用例如 NSUTF8StringEncoding NSUTF16StringEncoding NSUTF32StringEncoding 将字符串准确转换为c字符串。长度以字节为单位分别为 5 8 16

NSString *myAccentStr = @"José";
NSUInteger l1 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSUInteger l2 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF16StringEncoding];
NSUInteger l3 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF32StringEncoding];
NSLog(@"%ld %ld %ld", (long)l1, (long)l2, (long)l3);

> 5, 8, 16

出于转换目的,您应使用-maximumLengthOfBytesUsingEncoding代替-lengthOfBytesUsingEncoding

始终使用-canBeConvertedToEncoding

检查转换是否有效

有充分的理由使用NSString