Question

在Java中，我想从URL（instagram）中读取并保存所有HTML，但是出现错误429（请求过多）。我认为这是因为我尝试读取的行数超出了请求限制。

StringBuilder contentBuilder = new StringBuilder();
try {
    URL url = new URL("https://www.instagram.com/username");
    URLConnection con = url.openConnection();
    InputStream is =con.getInputStream();
    BufferedReader in = new BufferedReader(new InputStreamReader(is));
    String str;
    while ((str = in.readLine()) != null) {
        contentBuilder.append(str);
    }
    in.close();
} catch (IOException e) {
    log.warn("Could not connect", e);
}
String html = contentBuilder.toString();

错误是这样；

Could not connect
java.io.IOException: Server returned HTTP response code: 429 for URL: https://www.instagram.com/username/

它还显示由于此行而发生错误

InputStream is =con.getInputStream();

有人知道我为什么收到此错误和/或解决该错误的方法吗？

Answer 1

该问题可能是由于未关闭/断开连接引起的。对于自动关闭的try-with-resources输入，即使在异常或返回时也很有用。此外，您还构造了一个InputStreamReader，它将使用应用程序运行所在计算机的默认编码，但是您需要URL内容的字符集。 readLine返回不带行尾的行（通常非常有用）。因此，添加一个。

StringBuilder contentBuilder = new StringBuilder();
try {
    URL url = new URL("https://www.instagram.com/username");
    URLConnection con = url.openConnection();
    try (BufferedReader in = new BufferedReader(
                new InputStreamReader(con.getInputStream(), "UTF-8"))) {
        String line;
        while ((line = in.readLine()) != null) {
            contentBuilder.append(line).append("\r\n");
        }
    } finally {
        con.disconnect();
    } // Closes in.
} catch (IOException e) {
    log.warn("Could not connect", e);
}
String html = contentBuilder.toString();

读取HTML时的HTTP响应代码429

1 个答案: