我正在使用此调用加载网站HTML -
NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url];
[request setValue:@"utf-8" forHTTPHeaderField:@"Accept-Encoding"];
[request setValue:@"text/html" forHTTPHeaderField:@"Accept"];
[NSURLConnection sendAsynchronousRequest:request
queue:[NSOperationQueue currentQueue]
completionHandler:^(NSURLResponse *response, NSData *data, NSError *error) { ... }
然后,要将NSData转换为NSString,我需要知道编码,所以我调用 -
NSString *textEncoding = [response textEncodingName];
来自代码块,但它在未指定“Content-Encoding”标题字段的网站上返回nil。
如果我不知道编码,[[NSString alloc] initWithData:data encoding:responseEncoding]
将不会给我可读的HTML。
如何为未发送“Content-Encoding”标题字段的网站检测正确的编码?
答案 0 :(得分:2)
可以尝试不同的编码,看看哪一个结果带有可读文本 -
static int encodingPriority[] = {
NSUTF8StringEncoding,
NSASCIIStringEncoding,
NSISOLatin1StringEncoding,
NSISOLatin2StringEncoding,
NSUnicodeStringEncoding,
NSWindowsCP1251StringEncoding,
NSWindowsCP1252StringEncoding,
NSWindowsCP1253StringEncoding,
NSWindowsCP1254StringEncoding,
NSWindowsCP1250StringEncoding,
NSNEXTSTEPStringEncoding,
NSJapaneseEUCStringEncoding,
NSNonLossyASCIIStringEncoding,
NSShiftJISStringEncoding, /* kCFStringEncodingDOSJapanese */
NSISO2022JPStringEncoding, /* ISO 2022 Japanese encoding for e-mail */
NSMacOSRomanStringEncoding,
NSUTF16BigEndianStringEncoding,
NSUTF16LittleEndianStringEncoding,
NSUTF32StringEncoding,
NSUTF32BigEndianStringEncoding,
NSUTF32LittleEndianStringEncoding
};
#define REQUIRED_HTML_STRING @"<html"
- (NSString *)htmlStringForUnknownEncodingData:(NSData *)data detectedEncoding:(NSStringEncoding *)detectedEncoding
{
NSStringEncoding encoding;
NSString *html;
for (int i = 0; i < sizeof(encodingPriority); i++) {
encoding = encodingPriority[i];
// try this encoding
html = [[NSString alloc] initWithData:data encoding:encoding];
// we need to find a text, because bad encoding will return an unreadable text
if (html && [html rangeOfString:REQUIRED_HTML_STRING options:NSCaseInsensitiveSearch].location != NSNotFound) {
*detectedEncoding = encoding;
return html;
}
}
return nil;
}
然后,要检测NSData中的HTML使用的编码,请调用 -
NSStringEncoding encoding;
html = [self htmlStringForUnknownEncodingData:data detectedEncoding:&encoding];
if (html)
NSLog("Encoding detected!");
else
NSLog("No encoding detected");
答案 1 :(得分:0)
我尝试了@Kof的代码。我注意到我从响应中得到的编码是utf-8。如果直接将编码设置为[[NSString alloc] initWithData:data encoding:@"utf-8"]
,它肯定会返回null。这是因为编码接受类型NSStringEncoding
的类型为NSENUM
。如果您尝试[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding
,它会返回结果。