如何在jsoup中获得此类的值?

时间:2016-07-17 11:51:49

标签: android html jsoup

我想从此HTML文档中获取 pricecell / WebRupee 的值。

文档摘录如下所示。

<tr prodid="143012" class="tablerow style2">
            <td class="pricecell"><span class="WebRupee">Rs.</span> 29 <br><font style="font-size:smaller;font-weight:normal"> 3 days </font></td>
            <td class="spacer"></td>
            <td class="detailcell"><span><span class="label label-default" style="background-color:#3cb521;color:#fff;border:1px solid #3cb521">FULL TT</span>&nbsp; </span><span><span class="label label-default" style="background-color:#fff;color:#0c7abc;border:1px solid #0c7abc">SMS</span>&nbsp; </span>
             <div style="padding-top:5px">
               29 Full Talktime 
             </div>
             <div class="detailtext">
               5 Local A2A SMS valid for 1 day 
             </div></td>
           </tr>
           <tr prodid="127535" class="tablerow style2">
            <td class="pricecell"><span class="WebRupee">Rs.</span> 59 <br><font style="font-size:smaller;font-weight:normal"> 7 days </font></td>
            <td class="spacer"></td>
            <td class="detailcell"><span><span class="label label-default" style="background-color:#3cb521;color:#fff;border:1px solid #3cb521">FULL TT</span>&nbsp; </span><span><span class="label label-default" style="background-color:#fff;color:#0c7abc;border:1px solid #0c7abc">SMS</span>&nbsp; </span>
             <div style="padding-top:5px">
               59 Full Talktime 
             </div>
             <div class="detailtext">
               10 A2A SMS valid for 2 days 
             </div></td>
           </tr>
           <tr prodid="143025" class="tablerow style2">
            <td class="pricecell"><span class="WebRupee">Rs.</span> 99 <br><font style="font-size:smaller;font-weight:normal"> 12 days </font></td>
            <td class="spacer"></td>
            <td class="detailcell"><span><span class="label label-default" style="background-color:#3cb521;color:#fff;border:1px solid #3cb521">FULL TT</span>&nbsp; </span>
             <div style="padding-top:5px">
               99 Full Talktime 
             </div>
             <div class="detailtext">
               10 Local A2A SMS for 2 days only 
             </div></td>
           </tr>

我特别想要包含在pricecell-&gt; webrupee类中的值29,59,99,我需要它由jsoup解析。

我尝试的代码: -

 class kp extends AsyncTask<Void,Void,Void> {
            ArrayList<HashMap<String, String>> arraylist2 = new ArrayList<>();
            @Override
            protected void onPreExecute() {
                super.onPreExecute();

            }
            @Override
            protected Void doInBackground(Void... voids) {
                try {
                    Document doc = Jsoup.connect("http://www.ireff.in/plans/" + operator+"/" + state).userAgent("Mozilla/5.0 " +
                            "(Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36").get();
                    int count = 0, j = 0, i = 0;    
                    String TopupTable="";

                    for (Element table : doc.select("div[id=Topup]")) {
                        for (Element row : table.select("tr")) {
                            count++;

                            TopupTable=TopupTable+row.toString();//has all the values of topup category

                            System.out.print(TopupTable+"TopupTable row string here");
                        }

                    }

    ....
    ....
    ....
                    Elements r2;
                    String temp;
                    Document doc2 = Jsoup.parse(TopupTable, "",Parser.xmlParser());//doc2 has the TopupTable string converted to a "Document" type variable
                    for (Element table : doc.select("div[id=Topup]")) {
                        for (Element row : table.select("tr")) {
                            i++;
                            j++;
                            k++;


                            try {
                                Elements tds = row.select("td:not([rowspan])");
                                if(tds.contains("tr[id=download]"))
                                    continue;

     Elements tds2 = doc2.getElementsByClass("td[class=pricecell]");
temp=doc2.getElementsByClass("span[class=WebRupee]").toString();//trying to get those numeric values and store it in temp variable
                                        System.out.print(temp+"temp var");

我得到临时变量的空白值,请告诉我哪里出错了。

谢谢您的时间:-) 如果您对此问题有更多详细信息,请在下面发表评论。

3 个答案:

答案 0 :(得分:1)

我试过这样对我有用:

public class Test {
    public static void main(String[] args) {
        String parseText = "<table><tr prodid=\"143012\" class=\"tablerow style2\">\n" +
                "            <td class=\"pricecell\"><span class=\"WebRupee\">Rs.</span> 29 <br><font style=\"font-size:smaller;font-weight:normal\"> 3 days </font></td>\n" +
                "            <td class=\"spacer\"></td>\n" +
                "            <td class=\"detailcell\"><span><span class=\"label label-default\" style=\"background-color:#3cb521;color:#fff;border:1px solid #3cb521\">FULL TT</span>&nbsp; </span><span><span class=\"label label-default\" style=\"background-color:#fff;color:#0c7abc;border:1px solid #0c7abc\">SMS</span>&nbsp; </span>\n" +
                "             <div style=\"padding-top:5px\">\n" +
                "               29 Full Talktime \n" +
                "             </div>\n" +
                "             <div class=\"detailtext\">\n" +
                "               5 Local A2A SMS valid for 1 day \n" +
                "             </div></td>\n" +
                "           </tr>\n" +
                "           <tr prodid=\"127535\" class=\"tablerow style2\">\n" +
                "            <td class=\"pricecell\"><span class=\"WebRupee\">Rs.</span> 59 <br><font style=\"font-size:smaller;font-weight:normal\"> 7 days </font></td>\n" +
                "            <td class=\"spacer\"></td>\n" +
                "            <td class=\"detailcell\"><span><span class=\"label label-default\" style=\"background-color:#3cb521;color:#fff;border:1px solid #3cb521\">FULL TT</span>&nbsp; </span><span><span class=\"label label-default\" style=\"background-color:#fff;color:#0c7abc;border:1px solid #0c7abc\">SMS</span>&nbsp; </span>\n" +
                "             <div style=\"padding-top:5px\">\n" +
                "               59 Full Talktime \n" +
                "             </div>\n" +
                "             <div class=\"detailtext\">\n" +
                "               10 A2A SMS valid for 2 days \n" +
                "             </div></td>\n" +
                "           </tr>\n" +
                "           <tr prodid=\"143025\" class=\"tablerow style2\">\n" +
                "            <td class=\"pricecell\"><span class=\"WebRupee\">Rs.</span> 99 <br><font style=\"font-size:smaller;font-weight:normal\"> 12 days </font></td>\n" +
                "            <td class=\"spacer\"></td>\n" +
                "            <td class=\"detailcell\"><span><span class=\"label label-default\" style=\"background-color:#3cb521;color:#fff;border:1px solid #3cb521\">FULL TT</span>&nbsp; </span>\n" +
                "             <div style=\"padding-top:5px\">\n" +
                "               99 Full Talktime \n" +
                "             </div>\n" +
                "             <div class=\"detailtext\">\n" +
                "               10 Local A2A SMS for 2 days only \n" +
                "             </div></td>\n" +
                "           </tr></table>";

              Document doc = Jsoup.parse(parseText);
              doc.select("font").remove();
              doc.select("span").remove();
            for (Element row : doc.select("tr")) {
                Elements tds = row.select("td.pricecell");
                Whitelist wl = Whitelist.basic();
                String value = Jsoup.clean(tds.get(0).text(), wl);
                System.out.println(value);
            }

    }
}

输出:

29
59
99

答案 1 :(得分:1)

只需使用selectorsorg.jsoup.nodes.Element.ownText()即可提取没有子文字的单元格文本。

  

仅获取此元素拥有的文本;不合并   所有孩子的文字。

     Document doc = Jsoup
            .connect(url)
            .userAgent(userAgent)
            .get();

     Elements cells = doc.select("td.pricecell");

     ListIterator<Element> itr = cells.listIterator();
     while (itr.hasNext()) {
         Element cell = itr.next();
         System.out.println(cell.ownText());
     }

<强>输出

29
59
99

答案 2 :(得分:0)

您可以使用Node.childNodes检索List NodeDocument doc = Jsoup.parse(html); Elements trs = doc.select("table tr"); for (Element tr : trs) { Element priceCell = tr.select(".pricecell").first(); for (Node child : priceCell.childNodes()) { if (child instanceof TextNode) { System.out.println(((TextNode) child).text().trim()); } } } 个对象并检查每个对象的实例(在您的情况下为TextNode):

T