在Java中,我想从URL(instagram)中读取并保存所有HTML,但是出现错误429(请求过多)。我认为这是因为我尝试读取的行数超出了请求限制。
StringBuilder contentBuilder = new StringBuilder();
try {
URL url = new URL("https://www.instagram.com/username");
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
log.warn("Could not connect", e);
}
String html = contentBuilder.toString();
错误是这样;
Could not connect
java.io.IOException: Server returned HTTP response code: 429 for URL: https://www.instagram.com/username/
它还显示由于此行而发生错误
InputStream is =con.getInputStream();
有人知道我为什么收到此错误和/或解决该错误的方法吗?
答案 0 :(得分:1)
该问题可能是由于未关闭/断开连接引起的。
对于自动关闭的try-with-resources输入,即使在异常或返回时也很有用。此外,您还构造了一个InputStreamReader,它将使用应用程序运行所在计算机的默认编码,但是您需要URL内容的字符集。
readLine
返回不带行尾的行(通常非常有用)。因此,添加一个。
StringBuilder contentBuilder = new StringBuilder();
try {
URL url = new URL("https://www.instagram.com/username");
URLConnection con = url.openConnection();
try (BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream(), "UTF-8"))) {
String line;
while ((line = in.readLine()) != null) {
contentBuilder.append(line).append("\r\n");
}
} finally {
con.disconnect();
} // Closes in.
} catch (IOException e) {
log.warn("Could not connect", e);
}
String html = contentBuilder.toString();