如何用UTF-8读取NSInputStream?

时间:2013-02-10 14:22:52

标签: ios utf-8 nsinputstream

我尝试使用NSInputStream在iOS中读取一个大文件,用新行分隔文件行(我不想使用componentsSeparatedByCharactersInSet,因为它使用了太多内存)。

但是并非所有的行都是UTF-8编码的(因为它们看起来像ASCII,相同的字节)我常常收到Incorrect NSStringEncoding value 0x0000 detected. Assuming NSASCIIStringEncoding. Will stop this compatiblity mapping behavior in the near future.警告。

我的问题是:有没有办法通过例如压制此警告设置编译器标志?

此外:是否保存为追加/连接两个缓冲区读取,从字节流中读取,然后将缓冲区转换为字符串然后附加字符串可能会使字符串损坏?

下面是一个示例方法,它演示了字节到字符串的转换将丢弃UTF-8字符的前半部分和后半部分,因为它是无效的。

- (void)NSInputStreamTest {
  uint8_t testString[] = {0xd0, 0x91}; // @"Б"

  // Test 1: Read max 1 byte at a time of UTF-8 string
  uint8_t buf1[1], buf2[1];
  NSString *s1, *s2, *s3;
  NSInteger c1, c2;
  NSInputStream *inStream = [[NSInputStream alloc] initWithData:[[NSData alloc] initWithBytes:testString length:2]];

  [inStream open];
  c1 = [inStream read:buf1 maxLength:1];
  s1 = [[NSString alloc] initWithBytes:buf1 length:1 encoding:NSUTF8StringEncoding];
  NSLog(@"Test 1: Read %d byte(s): %@", c1, s1);
  c2 = [inStream read:buf2 maxLength:1];
  s2 = [[NSString alloc] initWithBytes:buf2 length:1 encoding:NSUTF8StringEncoding];
  NSLog(@"Test 1: Read %d byte(s): %@", c2, s2);
  s3 = [s1 stringByAppendingString:s2];
  NSLog(@"Test 1: Concatenated: %@", s3);
  [inStream close];

  // Test 2: Read max 2 bytes at a time of UTF-8 string
  uint8_t buf4[2];
  NSString *s4;
  NSInteger c4;
  NSInputStream *inStream2 = [[NSInputStream alloc] initWithData:[[NSData alloc] initWithBytes:testString length:2]];

  [inStream2 open];
  c4 = [inStream2 read:buf4 maxLength:2];
  s4 = [[NSString alloc] initWithBytes:buf4 length:2 encoding:NSUTF8StringEncoding];
  NSLog(@"Test 2: Read %d byte(s): %@", c4, s4);
  [inStream2 close];
}

输出:

2013-02-10 21:16:23.412 Test[11144:c07] Test 1: Read 1 byte(s): (null)
2013-02-10 21:16:23.413 Test[11144:c07] Test 1: Read 1 byte(s): (null)
2013-02-10 21:16:23.413 Test[11144:c07] Test 1: Concatenated: (null)
2013-02-10 21:16:23.413 Test[11144:c07] Test 2: Read 2 byte(s): Б

2 个答案:

答案 0 :(得分:1)

首先,在行中:alert($('#selectID').val())你试图连接到'nil'值。结果也是“零”。因此,您可能希望连接字节而不是字符串:

s3 = [s1 stringByAppendingString:s2];

输出:

uint8_t buf3[2];
buf3[0] = buf1[0];
buf3[1] = buf2[0];
s3 = [[NSString alloc] initWithBytes:buf3 length:2 encoding:NSUTF8StringEncoding];

辅助,UTF-8字符的长度可以是[1..6]字节。

2015-11-06 12:57:40.304 Test[10803:883182] Test 1: Read 1 byte(s): (null)
2015-11-06 12:57:40.305 Test[10803:883182] Test 1: Read 1 byte(s): (null)
2015-11-06 12:57:40.305 Test[10803:883182] Test 1: Concatenated: Б

因此,如果您打算从NSInputStream原始字节读取然后将它们转换为UTF-8 NSString,您可能希望从NSInputStream逐字节读取,直到您获得有效的字符串:

(1 byte)   0aaa aaaa         //if symbol lays in 0x00 .. 0x7F (ASCII)
(2 bytes)  110x xxxx 10xx xxxx
(3 bytes)  1110 xxxx 10xx xxxx 10xx xxxx
(4 bytes)  1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx
(5 bytes)  1111 10xx 10xx xxxx 10xx xxxx 10xx xxxx 10xx xxxx
(6 bytes)  1111 110x 10xx xxxx 10xx xxxx 10xx xxxx 10xx xxxx 10xx xxxx

答案 1 :(得分:0)

ASCII(因此换行符)是UTF-8的子集,所以不应该有任何冲突。

应该可以将流划分为换行符,就像在简单的ASCII流中一样。然后,您可以使用UTF-8将每个块(“行”)转换为NSString

您确定编码错误不真实,即您的信息流实际上可能包含与UTF-8编码有关的错误字符吗?

已编辑添加评论:

这假设线条由足够少的字符组成,以便在从UTF-8转换之前将整行保留在内存中。