我有这段代码:
- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock
{
NSString *someString = [[NSString alloc] initWithData:CDATABlock encoding:NSUTF8StringEncoding];
someString = [ someString stringByReplacingOccurrencesOfString:@"%" withString: @"&" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"|" withString: @"|" ];
someString = [ someString stringByReplacingOccurrencesOfString:@" " withString: @" " ];
someString = [ someString stringByReplacingOccurrencesOfString:@"–" withString:@"-"];
someString = [ someString stringByReplacingOccurrencesOfString:@"—" withString:@"——"];
someString = [ someString stringByReplacingOccurrencesOfString:@"‘" withString:@"'" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"’" withString:@"'" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"‚" withString:@"," ];
someString = [ someString stringByReplacingOccurrencesOfString:@"“" withString:@"\"" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"”" withString:@"\"" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"…" withString:@"..."];
someString = [ someString stringByReplacingOccurrencesOfString:@"&" withString:@"<"];
someString = [ someString stringByReplacingOccurrencesOfString:@"'" withString:@">"];
someString = [ someString stringByReplacingOccurrencesOfString:@"€" withString:@"€"];
someString = [ someString stringByReplacingOccurrencesOfString:@"→" withString:@"→"];
if(nil != self.currentItemValue){
[self.currentItemValue appendString:someString];
}
}
是否有自动执行此字符转换的功能?
答案 0 :(得分:2)
NSMutableString
并进行内联替换。
在任何情况下,您都可以一次完成此操作,但您必须自己编写代码。您可以使用NSScanner
或类似-rangeOfString:options:range:
的方法来查找每个连续的实体,然后自己找出它的替代品。如果您在NSMutableString
上运行,则可以用替换实体替换实体并继续搜索(在修改您的位置后(在NSScanner的情况下)或适当范围以考虑实体与实体之间的长度差异替代人物)。
答案 1 :(得分:2)
不是像那样硬编码替换,而是有更好的方法。
这些实体的格式为:&#
+十进制数+ ;
。十进制数位是该字符的unicode代码点的基本10版本。因此,您可以搜索此格式的子字符串,提取数字,并将其直接转换为字符。
这是一种方法,使用RegexKitLite查找字符串:
NSString * source = @"& ' |   – — ‘ ’ ‚ “ ” … € →";
NSString * regex = @"&#(\\d+);";
NSArray * matches = [source arrayOfCaptureComponentsMatchedByRegex:regex];
NSMutableString * decodedSource = [source mutableCopy];
for (NSArray * match in matches) {
NSString * fullMatch = [match objectAtIndex:0];
NSString * decimalCode = [match objectAtIndex:1];
unichar character = (unichar)[decimalCode intValue];
NSString * replacement = [NSString stringWithFormat:@"%C", character];
[decodedSource replaceOccurrencesOfString:fullMatch withString:replacement options:NSLiteralSearch range:NSMakeRange(0, [decodedSource length])];
}
NSLog(@"decoded: %@", decodedSource);
[decodedSource release];
在我的机器上,这会记录:
decoded: & ' | – — ‘ ’ ‚ “ ” … € →
这不是最有效的方法(最糟糕的情况是O(nm)
算法),但它是一个开始。 :)