java - 抓取网页内容但出现乱码

我使用HttpClient 4.3.6获取GooglePlay网页内容并将其写入本地文件，但有些文本是乱码，例如“鈥 ”，这是我的代码：

//httpclient
....
HttpEntity entity = response.getEntity();
InputStream in = entity.getContent();
byte[] b = new byte[2048];
StringBuffer out = new StringBuffer();
for(int n;(n=in.read(b))!=-1;) {
    out.append(new String(b, 0, n, "utf-8"));
}

因为googleplay的响应标题显示“content-type：text / html; charset = utf-8”

然后我用了commons.io.IOUtils

IOUtils.toString(in, "UTF-8");

问题无法解决。我该怎么办

抓取网页内容但出现乱码

0 个答案: