Question

所以我试图把刮刀作为我的第一个项目。我很新，我不太懂我写的代码。虽然我不理解它，但是在日食中似乎没有任何错误。

我写的代码是想读取html源文件并将其逐行添加到数组列表中，直到它无法返回列表。我真的不知道它是否简单但我不知道为什么它不起作用。

import java.util.ArrayList;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.List;
import java.io.BufferedReader;



public class Scraper {
    public static void main(String [] args)throws Exception{

        get_url_source("https://statsroyale.com/clan/99VUU8Y");
}

    public static List<String> get_url_source(String URL)throws Exception {

        List <String> source = new ArrayList <>();

        URL stats = new URL("https://statsroyale.com/clan/99VUU8Y");
        BufferedReader in = new BufferedReader(new InputStreamReader(stats.openStream()));

        String inputLine;
        while ((inputLine = in.readLine()) != null)
            source.add(inputLine);

        return source;
  }
}

如果格式错误，我真的很抱歉。仍然试图了解格式如何工作以及什么在哪里。（它并不像看起来那么简单）

ERROR消息很长但是现在是......

Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://statsroyale.com/clan/99VUU8Y
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
    at java.net.URL.openStream(Unknown Source)
    at Scraper.get_url_source(Scraper.java:21)
    at Scraper.main(Scraper.java:13)

Answer 1

该网站正在检查您的用户代理以查看是否有机器人访问，要欺骗网站您是普通用户，您必须更改用户代理这样：

    URL stats = new URL("https://statsroyale.com/clan/99VUU8Y");

    HttpsURLConnection statsConnection = (HttpsURLConnection) stats.openConnection();
    statsConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
    statsConnection.connect();

    BufferedReader in = new BufferedReader(new InputStreamReader(statsConnection.getInputStream()));

将HTML代码添加到列表中

1 个答案: