使用Jsoup进行错误的解析

时间:2013-01-21 07:03:32

标签: android jsoup

我正在用Jsoup解析网站http://animecalendar.net。 Аll很好解析,但我有一个问题。我得到一个混合的网址列表,但它们被正确解析(参见日志)

代码:

@Override
    protected ArrayList<Order> doInBackground(String... urls) {

        listItems.clear();
        myAdapter.notifyDataSetChanged();
        String dates = null;
                    String url = null;

        try {
            Document doc = Jsoup.connect(URL).get();

            Elements main = doc.select("div.day");
            for (Element m : main) {
                titles = m.select("div.tooltip");
                for (Element tts : titles) {

                    title = tts.select("td.tooltip_title h4").text();
                    time = tts.select("td.tooltip_info h4").text();
                    img = tts.select("td.tooltip_desc img[src]");

                    Order o = new Order();
                    o.setLink(URL + img.attr("src"));
                    o.setTextName(title);
                    o.setTextTime(time);
                    o.setTextDate(dates);
                    o.setDetailsUrl(URL + url);  // incorrect (mixed) displayed urls list in device
                    listItems.add(o);
                }

                Elements date = m.select("h2");                 
                for (Element m1 : date) {
                    dates = m1.select("a").attr("href");                        
                }

                Elements links = m.select("h3");
                for (Element link : links) {
                    url = link.select("a").attr("href");  // parse urls from site
                         System.out.println(url);  // in LogCat displayed correct urls list
                }                   
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return listItems;
    }

LogCat:

01-21 12:55:55.429: I/System.out(8036): /show/596/Cardfight%21%21_Vanguard%3A_Asia_Circuit_Hen
01-21 12:55:55.429: I/System.out(8036): /show/583/Inazuma_Eleven_GO_2%3A_Chrono_Stone
01-21 12:55:55.445: I/System.out(8036): /show/671/Ai_Mai_Mi_
01-21 12:55:55.445: I/System.out(8036): /show/697/Mangirl%21
etc...

结果,我得到了一个混合的网址列表 屏幕:
enter image description here

如何解决? 感谢。

1 个答案:

答案 0 :(得分:0)

此问题已解决

Elements epBox = doc.select("div.ep_box h3");
            int urlcount = 0;
            for (Element ep : epBox) {
                url = ep.select("a").attr("href");

                if (urlcount < listItems.size()) {
                    Order o = (Order) listItems.get(urlcount);
                    o.setDetailsUrl(URL + url);
                    newarraylist.add(o);
                }
                urlcount++;
            }