无法从使用jsoup的网站html解析获取文本和链接

时间:2017-08-11 13:09:07

标签: java jsoup html-parsing

我的问题是屏幕上有输出。我无法从table class =“gallerybig”获取文本和链接。请帮助。

如果您不理解我的问题,可以通过rushangbhavani@gmail.com与我联系。

请参阅代码中的olx链接。并查看源代码。

请检查链接,请参阅图片链接。

I want to get text and link from each table with class="gallerybig". there are 41 table so we have to run loop.

我正在netbeans上运行。

enter code here
    Document doc;
                try {
            doc = (Document) Jsoup.connect("https://www.olx.in/sale/?view=galleryBig&page=4")
                    .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0")
                    .get();
            String roottable = "tbody > tr";
     Elements first = doc.select("body[class=offersview.standard.smallscreen.bodyIndent]");
     Elements table = first.select("table[class=gallerybig]");

      for(Element base : table){
     for (Element t1:base.select(roottable)){

     Elements td = t1.select("td[valign=top]");

 Elements div = td.select("div[class=item.rel]");
 Elements divinner = div.select("div[class=inner.brkword]");
 Elements divclr = divinner.select("div[class=clr]");
 Elements h4 = divclr.select("h4[class=normal.large.lheight24]");

 Elements link = h4.select("a");
 Elements select = link.select("a[href]");

 Elements title = link.select("span");
 String titles = title.text();

 Elements pricetag = divclr.select("p[class=price.x-large]");
 Elements strong = pricetag.select("strong[class=c000]");
 String price = (String) strong.text();

 Elements ptag = divclr.select("p[class=lheight18.color-1.margintop8]");
        for (Iterator<Element> it = ptag.iterator(); it.hasNext();) {
            Element ype = it.next();
           String type = ype.text();
           System.out.println(type.toString());
           System.out.println("fg");


 }
    Elements span = ptag.select("span");
         String place = span.text();

        System.out.println(place);
        System.out.println(titles);
        System.out.println(price);


        System.out.println(select);
         }

   }



    } catch (IOException e) {
        e.printStackTrace();


      }

  } 

输出: - 跑: 建立成功(总时间:2秒)

屏幕上有输出。

1 个答案:

答案 0 :(得分:1)

我得到了答案:

   Document doc;
{
        try{
            doc = Jsoup.connect("https://www.olx.in/lucknow/sale/?search%5Bdescription%5D=1&view=galleryBig").get();  
             Elements img = doc.select("div.mheight.tcenter img.fleft.rel.zi2");
         for(Element src : img){
             System.out.println("\nimg : " + src.attr("src"));
         }

         Elements links = doc.select("div.clr > h4.normal.large.lheight24 > a[href]");  
         for (Element link : links) { 
             String classname = link.className().toString();
                    if(classname.contains("link linkWithHash detailsLinkPromoted linkWithHashPromoted")){
                       System.out.println("pro: yes " );
                    }

                   System.out.println(classname);
                   System.out.println("\nlink : " + link.attr("href"));
                   System.out.println("text : " + link.text());  

         }


         Elements price = doc.select("div.clr > p.price.x-large strong.c000");
         for (Element getprice: price){
             System.out.println("\nPrice : " + getprice.text());
         }

         Elements typeandplace = doc.select("p.lheight18.color-1.margintop8");
         for (Element gettypeplace : typeandplace){
             System.out.println(" \nPlace : " + gettypeplace.text());  
         }



        } catch (IOException e) {
            e.printStackTrace();
        }



}