如何使用Java从网站获取文本内容?

时间:2018-06-20 15:32:11

标签: java

根据我的参考,这可能有效:

URL url = new URL("www.gmail.com");       
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
    tr = in.readLine().toString();
    System.out.println(str);
}

1 个答案:

答案 0 :(得分:3)

您可以使用JSoup之类的库从HTML获取正文。

https://jsoup.org/cookbook/input/parse-body-fragment

String html = "<div><p>Lorem ipsum.</p>";
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();
String text = body.text();