如何使用java

时间:2016-12-17 13:25:14

标签: java html web-scraping jsoup

我想在tr中定位特定的td。

这是我的代码:

        private void fletch(String name) throws IOException, JSONException {
            final String iron = "img=2";
            final String ui = "img=3";
            final String hc = "img=10";
            String url = "services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=";

            if ( name.toLowerCase().indexOf(iron.toLowerCase()) != -1 ) {
                url = "http://services.runescape.com/m=hiscore_oldschool_ironman/hiscorepersonal.ws?user1=";
            }else if( name.toLowerCase().indexOf(ui.toLowerCase()) != -1 ){
                url = "http://services.runescape.com/m=hiscore_oldschool_ultimate/hiscorepersonal.ws?user1=";
            }else if( name.toLowerCase().indexOf(hc.toLowerCase()) != -1 ){
                url = "http://services.runescape.com/m=hiscore_oldschool_hardcore_ironman/hiscorepersonal.ws?user1=";
            }

            String[] parts = name.split(">");
            String part2 = parts[1];
            String fin = part2.replaceAll("\\s","+");
            url+=fin;

            Document doc = Jsoup.connect(url)
                    .data("query", "Java")
                    .userAgent("Mozilla")
                    .cookie("auth", "token")
                    .timeout(3000)
                    .post();

    //core part
            Element table1 =  doc.select("table").first();
                String body = table1.toString();
                Document docb = Jsoup.parseBodyFragment(body);
                Element bbd = docb.body();
                String hhk = bbd.toString();    

//This is where i dont know how to target the td data.. Tried this (cant check code so came on here):
    String overall = bbd.getElementsByTag("td").get(4).text();

现在这给了我这个HTML代码:

<table cellpadding="3" cellspacing="0" border=0 style="max-width: 355px;">
<tr><td colspan="5" align="center"><b>Personal scores for big kurwaaa</b></td></tr>
<tr>
<td colspan="2" style="text-align:left;padding-left:24px;"><b>Skill</b></td><td align="right"><b>Rank</b></td><td align="right"><b>Level</b></td><td align="right"><b>XP</b></td>
</tr>
<tr><td width="35"></td><td width="100"></td><td width="75"></td><td width="40"></td><td width="75"></td></tr>
<tr>

<td></td>
<td align="left"><a href="overall.ws?table=0&user=big+kurwaaa">
Overall
</a></td>
<td align="right">7,430</td>
<td align="right">466</td>
<td align="right">6,164,312</td>

</tr>
<tr>
<td align="right"><img class="miniimg" src="http://www.runescape.com/img/rsp777/hiscores/skill_icon_attack1.gif"></td>
<td align="left"><a href="overall.ws?table=1&user=big+kurwaaa">
Attack
</a></td>
<td align="right">14,475</td>
<td align="right">19</td>
<td align="right">4,304</td>

</tr>

我希望将每个tr内的数据定位到3 td。例如:

<td align="right">7,430</td>
<td align="right">466</td>
<td align="right">6,164,312</td>

等从“整体”​​tr到最后一个。是否有任何方法可以通过简单的方式为我提供循环数据并创建JSON / map的选项?

Ps:java的新手

2 个答案:

答案 0 :(得分:0)

        String url = "yourUrl";
        Document doc = Jsoup.connect(url).get();
        Element table = doc.select("table[class=tableClass]").first();
        Iterator<Element> iterator = table.select("td[align=right]").iterator();
        iterator.next();//skip first
        iterator.next();//skip second
        System.out.println(iterator.next().text());

答案 1 :(得分:0)

如果你想获得bbd中的所有tr标签,请使用getElementsByTag 它将返回Elements,您可以通过索引浏览所有tr标签(基于0的索引)。如果要跳过前3个tr标签,只需从index:3开始循环,对于td标签, 这是演示代码:

Elements trList = bbd.getElementsByTag("tr");

for (int i = 3; i < trList.size(); i++) {
    System.out.println("----------------- TR START -----------------");
    Elements tdList = trList.get(i).getElementsByTag("td");
    for (int j = 2; j < tdList.size(); j++) {
        System.out.println(tdList.get(j));
    }
    System.out.println("------------------ TR END ------------------");
}