在线阅读txt文件

时间:2016-11-25 04:54:17

标签: java web-scraping

考虑以下

代码

    private String url = "https://celestrak.com/NORAD/elements/resource.txt";

    @Override
    public Boolean crawl() {

        try {

            // Timeout is set to 20s
            Connection connection = Jsoup.connect(url).userAgent(USER_AGENT).timeout(20 * 1000);
            Document htmlDocument = connection.get();
            // 200 is the HTTP OK status code
            if (connection.response().statusCode() == 200) {
                System.out.println("\n**Visiting** Received web page at " + url);
            } else {
                System.out.println("\n**Failure** Web page not recieved at " + url);
                return Boolean.FALSE;
            }
            if (!connection.response().contentType().contains("text/plain")) {
                System.out.println("**Failure** Retrieved something other than plain text");
                return Boolean.FALSE;
            }

            System.out.println(htmlDocument.text()); // Here it print whole text file in one line

        } catch (IOException ioe) {
            // We were not successful in our HTTP request
            System.err.println(ioe);
            return Boolean.FALSE;
        }

        return Boolean.TRUE;
    }

输出

SCD 1 1 22490U 93009B 16329.83043855 .00000228 00000-0 12801-4 0 9993 2 22490 24.9691 122.2579 0043025 337.9285 169.5838 14.44465946256021 TECHSAT 1B (GO-32) 1 25397U ....

我正在尝试阅读online-txt文件(来自https://celestrak.com/NORAD/elements/resource.txt)。问题是,当我打印或保存正文的文本时,它会在一行中打印整个online-txt文件。但是我想把它读成\n分裂,以便我可以逐行阅读。我在阅读online-txt文件时犯了错误吗?

我正在使用JSoup。

2 个答案:

答案 0 :(得分:1)

你可以通过以下方式不使用jsoup来实现:

public static void main(String[] args) {
    String data;
    try {
        data = IOUtils.toString(new URL("https://celestrak.com/NORAD/elements/resource.txt"));
        for (String line : data.split("\n")) {
            System.out.println(line);
        }
    } catch (IOException e1) {
        e1.printStackTrace();
    }
}

以上代码使用 org.apache.commons.io.IOUtils

如果添加公共库是一个问题,您可以使用以下代码:

public static void main(String[] args) {
        URLReader reader;
        try {
            reader = new URLReader(new URL("https://celestrak.com/NORAD/elements/resource.txt"));
        BufferedReader bufferedReader = new BufferedReader(reader);
        String sCurrentLine;
        while ((sCurrentLine = bufferedReader.readLine()) != null) {
            System.out.println(sCurrentLine);
        }
        bufferedReader.close();
    } catch (MalformedURLException e1) {
        e1.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

答案 1 :(得分:0)

由于文件已经被行分隔符分隔,我们可以简单地从URL输入流来读取内容

    String url = "https://celestrak.com/NORAD/elements/resource.txt";
    List<String> text = new BufferedReader(new InputStreamReader(new URL(url).openStream())).lines().collect(Collectors.toList());

转换为字符串

    String content = new BufferedReader(new InputStreamReader(new URL(url).openStream())).lines()
            .collect(Collectors.joining(System.getProperty("line.separator")));