功能自动进行字符转换

时间:2011-01-13 20:01:30

标签: iphone objective-c string

我有这段代码:

- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock
{
    NSString *someString = [[NSString alloc] initWithData:CDATABlock encoding:NSUTF8StringEncoding];


   someString = [ someString stringByReplacingOccurrencesOfString:@"%" withString: @"&"  ];
   someString = [ someString stringByReplacingOccurrencesOfString:@"|" withString: @"|"  ];
   someString = [ someString stringByReplacingOccurrencesOfString:@" " withString: @" "  ];
   someString = [ someString stringByReplacingOccurrencesOfString:@"–" withString:@"-"];
   someString = [ someString stringByReplacingOccurrencesOfString:@"—" withString:@"——"];
   someString = [ someString stringByReplacingOccurrencesOfString:@"‘" withString:@"'"  ];
   someString = [ someString stringByReplacingOccurrencesOfString:@"’" withString:@"'"  ];
   someString = [ someString stringByReplacingOccurrencesOfString:@"‚" withString:@","  ];
   someString = [ someString stringByReplacingOccurrencesOfString:@"“" withString:@"\""  ];
   someString = [ someString stringByReplacingOccurrencesOfString:@"”" withString:@"\""  ];
   someString = [ someString stringByReplacingOccurrencesOfString:@"…" withString:@"..."];
   someString = [ someString stringByReplacingOccurrencesOfString:@"&#38;" withString:@"<"];
   someString = [ someString stringByReplacingOccurrencesOfString:@"&#39;" withString:@">"];
   someString = [ someString stringByReplacingOccurrencesOfString:@"&#8364;" withString:@"€"];
   someString = [ someString stringByReplacingOccurrencesOfString:@"&#8594;" withString:@"→"];

   if(nil != self.currentItemValue){
      [self.currentItemValue appendString:someString];
   }
}

是否有自动执行此字符转换的功能?

2 个答案:

答案 0 :(得分:2)

哇,这很糟糕,而且效率低下。至少,请切换到使用NSMutableString并进行内联替换。

在任何情况下,您都可以一次完成此操作,但您必须自己编写代码。您可以使用NSScanner或类似-rangeOfString:options:range:的方法来查找每个连续的实体,然后自己找出它的替代品。如果您在NSMutableString上运行,则可以用替换实体替换实体并继续搜索(在修改您的位置后(在NSScanner的情况下)或适当范围以考虑实体与实体之间的长度差异替代人物)。

答案 1 :(得分:2)

不是像那样硬编码替换,而是有更好的方法。

这些实体的格式为:&# +十进制数+ ;。十进制数位是该字符的unicode代码点的基本10版本。因此,您可以搜索此格式的子字符串,提取数字,并将其直接转换为字符。

这是一种方法,使用RegexKitLite查找字符串:

NSString * source = @"&#38; &#39; &#124; &#160; &#8211; &#8212; &#8216; &#8217; &#8218; &#8220; &#8221; &#8230; &#8364; &#8594;";

NSString * regex = @"&#(\\d+);";
NSArray * matches = [source arrayOfCaptureComponentsMatchedByRegex:regex];

NSMutableString * decodedSource = [source mutableCopy];
for (NSArray * match in matches) {
    NSString * fullMatch = [match objectAtIndex:0];
    NSString * decimalCode = [match objectAtIndex:1];

    unichar character = (unichar)[decimalCode intValue];
    NSString * replacement = [NSString stringWithFormat:@"%C", character];

    [decodedSource replaceOccurrencesOfString:fullMatch withString:replacement options:NSLiteralSearch range:NSMakeRange(0, [decodedSource length])];
}

NSLog(@"decoded: %@", decodedSource);
[decodedSource release];

在我的机器上,这会记录:

decoded: & ' |   – — ‘ ’ ‚ “ ” … € →

这不是最有效的方法(最糟糕的情况是O(nm)算法),但它是一个开始。 :)