I need to do some character processing of huge strings in Cocoa (from Objective-C or Swift), where:
NSString
has n charactersNSString
unichar
(so as to make indexing and length computation O(1))For the sake of the example, let's say the processing is a rot13 obfuscation.
I want to do it space and time efficiently:
NSString
without doing another copy.I want space complexity ≤ 2*n+ O(1).
I want time complexity O(n) - with as small a constant as possible.
The NSString
API allows for that easily, but is too inefficient, with plenty of back and forth conversion from character to string. I am shooting for C-level efficient processing of characters here.
The NSString
API also allows to get a buffer of character with methods such as dataUsingEncoding:
or UTF8String
. But I can't find a way to use the API where I copy the characters for processing no more than once.
答案 0 :(得分:1)
分配unichar
的缓冲区。复制到缓冲区w / getCharacters(range:)
。操纵。使用init(charactersNoCopy:length:freeWhenDone:)
转换回来。
unichar
是UTF-16。如果您愿意假设没有任何东西需要代理字符(例如,如果您认为它是ASCII),那么您可以根据length
分配缓冲区(它将是2 *长度)。如果您想要更灵活,但仍需要以2-3倍内存需求为代价的O(1),那么请使用maximumLengthOfBytes
。如果你想更灵活,但愿意接受O(n)步骤(我假设你不是),那么使用lengthOfBytes
。
NSString
内部存储为UTF-16有点常见,所以这往往是一种非常快速的转换。也就是说,如果您对字符串有足够的了解,并且愿意编写额外的代码来直接操作编码,那么请查看fastestEncoding
。