NSInputStream中的字符串无效utf8。如何转换为utf8更多'有损'

时间:2015-05-21 11:48:50

标签: encoding utf-8 nsstring

我有一个从服务器读取数据的应用程序。偶尔,数据似乎不是有效的UTF-8。如果我从字节数组转换为UTF8-String,则字符串显示为nil。字节数组中必须存在一些无效的非UTF8字符。有没有办法“有损”'将字节数组转换为UTF8并仅过滤掉无效字符?

有什么想法吗?

我的代码如下所示:

- (void)stream:(NSStream *)theStream handleEvent:(NSStreamEvent)streamEvent {

switch (streamEvent){
    case NSStreamEventHasBytesAvailable:
    {
        uint8_t buffer[1024];
        int len;
        NSMutableData * inputData = [NSMutableData data];
        while ([directoryStream hasBytesAvailable]){
            len = [directoryStream read:buffer maxLength:sizeof(buffer)];
            if (len> 0) {
                [inputData appendBytes:(const void *)buffer length:len];
            }
        }
        NSString *directoryString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
    }
    NSLog(@"directoryString: %@", directoryString);

    ...

有没有办法在一个更有损的'中进行这种转换?方式是什么?

如您所见,我首先将数据块附加到NSData值,并在读取所有内容时转换为utf8。这可以防止(多字节)utf8字符被分割,从而导致更多无效(空)utf8字符串。

1 个答案:

答案 0 :(得分:2)

有效!通过结合Larme的代码片段和关于UTF-8字符大小的注释,我设法创建了一个'有损'的NSData到UTF-8 NSString转换方法。

+ (NSString *) data2UTF8String:(NSData *) data {

    // First try to do the 'standard' UTF-8 conversion 
    NSString * bufferStr = [[NSString alloc] initWithData:data
                                                 encoding:NSUTF8StringEncoding];

    // if it fails, do the 'lossy' UTF8 conversion
    if (!bufferStr) {
        const Byte * buffer = [data bytes];

        NSMutableString * filteredString = [[NSMutableString alloc] init];

        int i = 0;
        while (i < [data length]) {

            int expectedLength = 1;

            if      ((buffer[i] & 0b10000000) == 0b00000000) expectedLength = 1;
            else if ((buffer[i] & 0b11100000) == 0b11000000) expectedLength = 2;
            else if ((buffer[i] & 0b11110000) == 0b11100000) expectedLength = 3;
            else if ((buffer[i] & 0b11111000) == 0b11110000) expectedLength = 4;
            else if ((buffer[i] & 0b11111100) == 0b11111000) expectedLength = 5;
            else if ((buffer[i] & 0b11111110) == 0b11111100) expectedLength = 6;

            int length = MIN(expectedLength, [data length] - i);
            NSData * character = [NSData dataWithBytes:&buffer[i] length:(sizeof(Byte) * length)];

            NSString * possibleString = [NSString stringWithUTF8String:[character bytes]];
            if (possibleString) {
                [filteredString appendString:possibleString];
            }
            i = i + expectedLength;
        }
        bufferStr = filteredString;
    }

    return bufferStr;
}

如果您有任何意见,请告诉我。 谢谢Larme!