打开任何协议的网址链接(不仅是http和https)

时间:2019-03-28 12:39:28

标签: java

公共类HTML {

public static List<String> extractLinks(String url) throws IOException{
 Document doc = (Document) Jsoup.connect(url).get();

    Elements links = doc.select("a[href]");

    for(Element link : links)
    {
        System.out.println(" Link   : "+link.attr("abs:href"));
        Document doc1 = Jsoup.connect(link.attr("abs:href")).get();
        String title = doc1.title();
        if(doc1 != null)
        {

        System.out.println(" Title  :"+title);
        System.out.println("\n");
        }
        else
        {
            System.out.println("Not found");
        }

    }

    return null;
} 

public static void main(String[] args) throws IOException {

    try
    {
        String site = "http://english.whut.edu.cn/";
        Html.extractLinks(site);
    }catch(Exception e)
    {
        System.out.println(e);
    }

}

}

此代码只能打开和读取http和https协议的标题,但是我也需要打开和读取其他协议。有什么具体方法吗?

1 个答案:

答案 0 :(得分:0)

也许这可以帮助您:

urlSource = getURLSource("YourURL");


public static String getURLSource(String url) throws IOException{
    URL urlObject = new URL(url);
    URLConnection urlConnection = urlObject.openConnection();
    urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");

    return toString(urlConnection.getInputStream());
}

public static String toString(InputStream inputStream) throws IOException{
    try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"))){
        String inputLine;
        StringBuilder stringBuilder = new StringBuilder();
        while ((inputLine = bufferedReader.readLine()) != null){
            stringBuilder.append(inputLine);
        }

        return stringBuilder.toString();
    }
}  

使用此功能,您可以在String中获取所需任何网站的源代码