jsoup从<a> tag

时间:2016-03-06 17:12:54

标签: java html parsing jsoup

I want to print the text inside <span> tag which is inside <a> tag. I want to print 37 which is inside <span class="rep-score">37</span>

<a href="//stackoverflow.com"
       class="site-link js-gps-track"
       data-id="1"
       data-gps-track="
            site.switch({ target_site:1, item_type:3 }),
        site_switcher.click({ item_type:1 })">
        <div class="site-icon favicon favicon-stackoverflow" title="Stack Overflow"></div>
        Stack Overflow
            <span class="rep-score">37</span>
</a>

Below is the code which I wrote to do this but nothing gets printed.
Can somebody explain why it's not working.

import java.io.IOException;  
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;  
import org.jsoup.select.Elements;
import org.jsoup.*;  
import org.jsoup.nodes.*; 
import java.io.*; 

import org.jsoup.nodes.Document; 
class Repoo
{ 
    static int count=0;
    // String html;
    public static void main(String s[])throws IOException
    {
        try{
    // Document doc=Jsoup.connect("http://www.javatpoint.com/java-tutorial").get();
    // Document doc=Jsoup.connect("http://stackoverflow.com/").get();
    Document doc = Jsoup
    .connect("http://www.stackoverflow.com")
    .userAgent("Google Chrome/48.0.2564.116 m")
    .get();

    // System.out.println("doc");
    // Elements link=(Elements)doc.select("span[class]");
    // Elements select=doc.select(".site-icon favicon favicon-stackoverflow");
    Elements select=doc.select("a.site-link js-gps-track > span.rep-score");

    // Elements link=(Elements)doc.select("div");

    // Elements link = doc.select("span").first();
    // Elements link = (Elements)doc.select("span");
     // Elements link = (Elements)doc.select("a[href]");

for(Element el: select)
{
    // System.out.print("-");
    // String repo=el.attr("class");
    System.out.println(el.text());
    // System.out.println(el.ownText());




//  if(repo.equals("rep-score"))
//  {
//   System.out.println(el.attr("class"));  
//  System.out.println(el.text());
// }
    // System.out.println(el.attr("id"));
    // count++;

    // String str=el.attr("href");
    // System.out.println(str);

}
// System.out.println("<"+count+">");
}catch(IOException e){System.out.println(e);}
}
}

1 个答案:

答案 0 :(得分:2)

您的代码没有发送登录Stack Overflow所需的任何凭据,因此您将获得未注册用户的响应页面,该页面不包含任何<span class="rep-score">37</span>标记。

你可以尝试

顺便说一句,如果你想选择<a ..>只有几个类,只需将它们与a.class1.class2结合使用,而不是a.class1 class2,因为这样的选择器会尝试查找a.class1然后{{1在其中标记。

因此,如果您能够通过jsoup登录并获得真正包含<class2 ..>的{​​{1}},您应该可以选择

doc