我正在尝试修复一个通过URL处理html页面并获取内容字节的Java应用
// this a simplified part of code
private static final Pattern PAT_CHARSET = Pattern.compile("charset=([^; ]+)$");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
String ct = conn.getContentType();
Charset cs = Charset.forName("utf-8");
String encoding;
if (ct != null) {
Matcher in = PAT_CHARSET.matcher(ct);
if (in.find()) {
encoding = in.group(1);
cs = Charset.forName(encoding);
}
}
Object in1 = conn.getInputStream();
encoding = conn.getContentEncoding();
if (encoding != null) {
if ("gzip".equalsIgnoreCase(encoding)) {
in1 = new GZIPInputStream((InputStream) in1);
}
}
...
但是对于某些网址,我会收到此错误
unsupported Content-Encoding: br