当NSURLResponse为textEncodingName返回nil时检测HTML编码

时间:2013-07-17 14:38:46

标签: html ios http encoding

我正在使用此调用加载网站HTML -

    NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url];
    [request setValue:@"utf-8" forHTTPHeaderField:@"Accept-Encoding"];
    [request setValue:@"text/html" forHTTPHeaderField:@"Accept"];
    [NSURLConnection sendAsynchronousRequest:request
                                       queue:[NSOperationQueue currentQueue]
                           completionHandler:^(NSURLResponse *response, NSData *data, NSError *error) { ... }

然后,要将NSData转换为NSString,我需要知道编码,所以我调用 -

NSString *textEncoding = [response textEncodingName];

来自代码块,但它在未指定“Content-Encoding”标题字段的网站上返回nil。

如果我不知道编码,[[NSString alloc] initWithData:data encoding:responseEncoding]将不会给我可读的HTML。

如何为未发送“Content-Encoding”标题字段的网站检测正确的编码?

2 个答案:

答案 0 :(得分:2)

可以尝试不同的编码,看看哪一个结果带有可读文本 -

static int encodingPriority[] = {
    NSUTF8StringEncoding,
    NSASCIIStringEncoding,
    NSISOLatin1StringEncoding,
    NSISOLatin2StringEncoding,
    NSUnicodeStringEncoding,
    NSWindowsCP1251StringEncoding,
    NSWindowsCP1252StringEncoding,
    NSWindowsCP1253StringEncoding,
    NSWindowsCP1254StringEncoding,
    NSWindowsCP1250StringEncoding,
    NSNEXTSTEPStringEncoding,
    NSJapaneseEUCStringEncoding,
    NSNonLossyASCIIStringEncoding,
    NSShiftJISStringEncoding,          /* kCFStringEncodingDOSJapanese */
    NSISO2022JPStringEncoding,        /* ISO 2022 Japanese encoding for e-mail */
    NSMacOSRomanStringEncoding,
    NSUTF16BigEndianStringEncoding,
    NSUTF16LittleEndianStringEncoding,
    NSUTF32StringEncoding,
    NSUTF32BigEndianStringEncoding,
    NSUTF32LittleEndianStringEncoding
};

#define REQUIRED_HTML_STRING    @"<html"

- (NSString *)htmlStringForUnknownEncodingData:(NSData *)data detectedEncoding:(NSStringEncoding *)detectedEncoding
{
    NSStringEncoding encoding;
    NSString *html;

    for (int i = 0; i < sizeof(encodingPriority); i++) {
        encoding = encodingPriority[i];

        // try this encoding
        html = [[NSString alloc] initWithData:data encoding:encoding];

        // we need to find a text, because bad encoding will return an unreadable text
        if (html && [html rangeOfString:REQUIRED_HTML_STRING options:NSCaseInsensitiveSearch].location != NSNotFound) {
            *detectedEncoding = encoding;
            return html;
        }
    }
    return nil;
}

然后,要检测NSData中的HTML使用的编码,请调用 -

NSStringEncoding encoding;
html = [self htmlStringForUnknownEncodingData:data detectedEncoding:&encoding];

if (html)
    NSLog("Encoding detected!");
else
    NSLog("No encoding detected");

答案 1 :(得分:0)

我尝试了@Kof的代码。我注意到我从响应中得到的编码是utf-8。如果直接将编码设置为[[NSString alloc] initWithData:data encoding:@"utf-8"],它肯定会返回null。这是因为编码接受类型NSStringEncoding的类型为NSENUM。如果您尝试[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding,它会返回结果。