所以我试图把刮刀作为我的第一个项目。我很新,我不太懂我写的代码。虽然我不理解它,但是在日食中似乎没有任何错误。
我写的代码是想读取html源文件并将其逐行添加到数组列表中,直到它无法返回列表。我真的不知道它是否简单但我不知道为什么它不起作用。
import java.util.ArrayList;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.List;
import java.io.BufferedReader;
public class Scraper {
public static void main(String [] args)throws Exception{
get_url_source("https://statsroyale.com/clan/99VUU8Y");
}
public static List<String> get_url_source(String URL)throws Exception {
List <String> source = new ArrayList <>();
URL stats = new URL("https://statsroyale.com/clan/99VUU8Y");
BufferedReader in = new BufferedReader(new InputStreamReader(stats.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
source.add(inputLine);
return source;
}
}
如果格式错误,我真的很抱歉。仍然试图了解格式如何工作以及什么在哪里。 (它并不像看起来那么简单)
ERROR消息很长但是现在是......
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://statsroyale.com/clan/99VUU8Y
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at Scraper.get_url_source(Scraper.java:21)
at Scraper.main(Scraper.java:13)
答案 0 :(得分:0)
该网站正在检查您的用户代理以查看是否有机器人访问, 要欺骗网站您是普通用户,您必须更改用户代理 这样:
URL stats = new URL("https://statsroyale.com/clan/99VUU8Y");
HttpsURLConnection statsConnection = (HttpsURLConnection) stats.openConnection();
statsConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
statsConnection.connect();
BufferedReader in = new BufferedReader(new InputStreamReader(statsConnection.getInputStream()));