考虑以下
代码
private String url = "https://celestrak.com/NORAD/elements/resource.txt";
@Override
public Boolean crawl() {
try {
// Timeout is set to 20s
Connection connection = Jsoup.connect(url).userAgent(USER_AGENT).timeout(20 * 1000);
Document htmlDocument = connection.get();
// 200 is the HTTP OK status code
if (connection.response().statusCode() == 200) {
System.out.println("\n**Visiting** Received web page at " + url);
} else {
System.out.println("\n**Failure** Web page not recieved at " + url);
return Boolean.FALSE;
}
if (!connection.response().contentType().contains("text/plain")) {
System.out.println("**Failure** Retrieved something other than plain text");
return Boolean.FALSE;
}
System.out.println(htmlDocument.text()); // Here it print whole text file in one line
} catch (IOException ioe) {
// We were not successful in our HTTP request
System.err.println(ioe);
return Boolean.FALSE;
}
return Boolean.TRUE;
}
输出
SCD 1 1 22490U 93009B 16329.83043855 .00000228 00000-0 12801-4 0 9993 2 22490 24.9691 122.2579 0043025 337.9285 169.5838 14.44465946256021 TECHSAT 1B (GO-32) 1 25397U ....
我正在尝试阅读online-txt文件(来自https://celestrak.com/NORAD/elements/resource.txt)。问题是,当我打印或保存正文的文本时,它会在一行中打印整个online-txt文件。但是我想把它读成\n
分裂,以便我可以逐行阅读。我在阅读online-txt文件时犯了错误吗?
我正在使用JSoup。
答案 0 :(得分:1)
你可以通过以下方式不使用jsoup来实现:
public static void main(String[] args) {
String data;
try {
data = IOUtils.toString(new URL("https://celestrak.com/NORAD/elements/resource.txt"));
for (String line : data.split("\n")) {
System.out.println(line);
}
} catch (IOException e1) {
e1.printStackTrace();
}
}
以上代码使用 org.apache.commons.io.IOUtils
如果添加公共库是一个问题,您可以使用以下代码:
public static void main(String[] args) {
URLReader reader;
try {
reader = new URLReader(new URL("https://celestrak.com/NORAD/elements/resource.txt"));
BufferedReader bufferedReader = new BufferedReader(reader);
String sCurrentLine;
while ((sCurrentLine = bufferedReader.readLine()) != null) {
System.out.println(sCurrentLine);
}
bufferedReader.close();
} catch (MalformedURLException e1) {
e1.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
答案 1 :(得分:0)
由于文件已经被行分隔符分隔,我们可以简单地从URL输入流来读取内容
String url = "https://celestrak.com/NORAD/elements/resource.txt";
List<String> text = new BufferedReader(new InputStreamReader(new URL(url).openStream())).lines().collect(Collectors.toList());
转换为字符串
String content = new BufferedReader(new InputStreamReader(new URL(url).openStream())).lines()
.collect(Collectors.joining(System.getProperty("line.separator")));