公共类HTML {
public static List<String> extractLinks(String url) throws IOException{
Document doc = (Document) Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for(Element link : links)
{
System.out.println(" Link : "+link.attr("abs:href"));
Document doc1 = Jsoup.connect(link.attr("abs:href")).get();
String title = doc1.title();
if(doc1 != null)
{
System.out.println(" Title :"+title);
System.out.println("\n");
}
else
{
System.out.println("Not found");
}
}
return null;
}
public static void main(String[] args) throws IOException {
try
{
String site = "http://english.whut.edu.cn/";
Html.extractLinks(site);
}catch(Exception e)
{
System.out.println(e);
}
}
}
此代码只能打开和读取http和https协议的标题,但是我也需要打开和读取其他协议。有什么具体方法吗?
答案 0 :(得分:0)
也许这可以帮助您:
urlSource = getURLSource("YourURL");
public static String getURLSource(String url) throws IOException{
URL urlObject = new URL(url);
URLConnection urlConnection = urlObject.openConnection();
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
return toString(urlConnection.getInputStream());
}
public static String toString(InputStream inputStream) throws IOException{
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"))){
String inputLine;
StringBuilder stringBuilder = new StringBuilder();
while ((inputLine = bufferedReader.readLine()) != null){
stringBuilder.append(inputLine);
}
return stringBuilder.toString();
}
}
使用此功能,您可以在String中获取所需任何网站的源代码