无法从html获取链接 - jsoup

时间:2018-03-11 18:39:54

标签: java web-scraping jsoup

使用以下代码,我可以从网站上获取所需的文本,但无法获取文本的相关链接。尝试了几种方法排列和组合。我得到的最多是整个外部html,如下所示:

<li class="list-item">
<h4><a class="bold" href="abacavir.htm">Abacavir </a>   </h4>

Abacavir is an antiviral drug that is effective against the HIV-1 virus.</li>

以下是代码:

   public static void main(String[] args) throws Exception {
        Map<String,String> drugLinks = new LinkedHashMap<String,String>();
        final int OK = 200;
        //String currentURL;
        //int page = 1;
        int status = OK;
        Connection.Response response = null;
        Document doc = null;
        String[] keywords = {"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"};
        //String keyword = "a";
        for (String keyword : keywords){
            final String url = "https://www.medindia.net/doctors/drug_information/home.asp?alpha=" + keyword;
                response = Jsoup.connect(url)
                        .userAgent("Mozilla/5.0")
                        .execute();
                status = response.statusCode();

                    doc = response.parse();


                            Element tds = doc.select("div.related-links.top-gray.col-list.clear-fix").first();

                            Elements links = tds.select("li[class=list-item]");

                                for (Element link : links){
                                    System.out.println("generic::"+link.select("a[href]").text());
                                    System.out.println("link::"+link.attr("abs:a"));
                }

            }
        }

输出

generic::Abacavir
link::
generic::Abacavir Sulfate and Lamivudine
link::
generic::Abacavir Sulfate, Lamivudine and Zidovudine
link::
generic::Abaloparatide
link::
generic::Abarelix
link::

如何从给定的HTML中获取绝对链接?

1 个答案:

答案 0 :(得分:0)

要从元素获取链接,您可以使用:

if(isset($_POST['submit'])) {
$post_title = $_POST['post_title'];
$post_date = date('d-m-y');
$category_id = $_POST['category_id'];
$post_author = $_POST['post_author'];
$post_keywords = $_POST['post_keywords'];
$post_image = $_FILES['post_image']['name'];
$post_image_tmp = $_FILES['post_image']['tmp_name'];
$post_content = $_POST['post_content'];

if($post_title == '' or $category_id=='null' or $post_author==''
 or $post_keywords=='' or $post_image=='' or $post_content=='') {


     echo "<script>alert('Molimo vas popunite sva polja.')</script>";
     echo "<script>window.open('insert_post.php','_self')</script>";

  }
  else {
 move_uploaded_file($post_image_tmp,"images/$post_image");

 $insert_posts = "INSERT INTO posts (category_id, post_title, post_date,
 post author, post_keywords, post_image, post_content)
 VALUES ('$category_id','$post_title', 
'$post_date','$post_author','$post_keywords','$post_image','$post_content'
 )";

mysqli_query($con, $insert_posts) or die(mysqli_error($con));

}
}
?>

但是,这只会给你相关的链接。 完整链接将是:

link.select("a").attr("href")