我有这种方法。
private static String parsePageHeaderInfo(String urlStr) throws Exception {
String word_google = "google";
String word_twitter = "twitter";
String title , description , image , content;
image = "";
Document doc = Jsoup.connect(urlStr).userAgent("Mozilla").get();
title = doc.title();
if(title.equals(""))
{
title= doc.select("meta[property=og:title]").attr("content");
}
description = doc.select("meta[name=description]").attr("content");
if(description.equals(""))
{
description= doc.select("meta[name=keywords]").attr("content");
}
if(description.equals(""))
{
description= doc.select("meta[property=og:description]").attr("content");
}
if(description.equals(""))
{
description = title;
}
Elements src_img = doc.select("img[src~=(?i)\\.(png|jpe?g)]");
if(src_img.size() > 0 )
{
image = src_img.first().attr("content");
}
if(image.equals(""))
{
image = doc.select("meta[property=og:image]").attr("content");
}
if(image.equals(""))
{
src_img = doc.select("link[href~=(?i)\\.(ico)]");
if(src_img.size() > 0 )
{
if(urlStr.contains(word_twitter) && image.equals(""))
{
image = src_img.first().attr("href");
}
else
{
image = urlStr + src_img.first().attr("href");
}
}
}
if(urlStr.contains(word_google) && image.equals(""))
{
image = urlStr + "/images/google_favicon_128.png";
}
return title + " \n a "+ description + " \n b" + image ;
}
String e = parsePageHeaderInfo("https://www.youtube.com/watch?v=HMUDVMiITOU");
System.out.println(e);
当我在android studio中执行此代码时,输出为:
title : YouTube.
description : YouTube.
image : https: //www.youtube.com/watch?v=HMUDVMiITOU//s.ytimg.com/yts/favicon-vfldLzJxy.ico.
但在netbeans中输出为:
title : DJ Snake, Lil Jon - Turn Down for What - YouTube.
description : Download the single on iTunes: http://smarturl.it/TD4W Director- Daniels Producer- Judy Craig Co Producer- Jonathan Wang Executive Producer- Candice Ouaknine...
image : https: //i.ytimg.com/vi/HMUDVMiITOU/hqdefault.jpg.
有什么区别? ,第二个选项是正确的。
答案 0 :(得分:0)
尝试使用其他用户代理Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36
。还包括推荐人http://www.google.com
您提供的用户代理可能不够或无效。
您可以找到许多用户代理here