如何从Java中的特定链接获取页面源?

时间:2014-05-02 10:18:21

标签: java xml

是否可以将inputStream特定网站内容或其网页来源转换为字符串?

例如,我想将特定网站的整个html标签下载到字符串或xml中。可能吗?

2 个答案:

答案 0 :(得分:1)

是的,你当然要做一些像

这样的事情
public static void main(String[] args) {

    URL url;

    try {
        // get URL content
        url = new URL("http://www.mkyong.com");
        URLConnection conn = url.openConnection();

        // open the stream and put it into BufferedReader
        BufferedReader br = new BufferedReader(
                           new InputStreamReader(conn.getInputStream()));

        String inputLine;

        //save to this filename
        String fileName = "/users/mkyong/test.html";
        File file = new File(fileName);

        if (!file.exists()) {
            file.createNewFile();
        }

        //use FileWriter to write file
        FileWriter fw = new FileWriter(file.getAbsoluteFile());
        BufferedWriter bw = new BufferedWriter(fw);

        while ((inputLine = br.readLine()) != null) {
            bw.write(inputLine);
        }

        bw.close();
        br.close();

        System.out.println("Done");

    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}

信用:mkyong

答案 1 :(得分:1)

你可能想看看番石榴的CharStreams课程。

CharStreams.toString(new InputStreamReader(..))

将使您免于编写大量样板代码。

Here is doc