使用android中的jsoup解析dom解析器中的图像

时间:2016-01-16 06:59:43

标签: java android dom jsoup

我正在尝试获取此网站的RSS Feed:

http://www.phonearena.com/feed

这是我的domparser活动:

public class DOMParser {
private RSSFeed _feed = new RSSFeed();

public RSSFeed parseXml(String xml) {

    URL url = null;
    try {
        url = new URL(xml);
    } catch (MalformedURLException e1) {
        e1.printStackTrace();
    }

    try {

        DocumentBuilderFactory dbf;
        dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();


        Document doc = db.parse(new InputSource(url.openStream()));
        doc.getDocumentElement().normalize();

        NodeList nl = doc.getElementsByTagName("item");
        NodeList itemChildren = null;
        Node currentItem = null;
        Node currentChild = null;
        int length = nl.getLength();

        for (int i = 0; i < length; i++) {
             currentItem = nl.item(i);
            RSSItem _item = new RSSItem();

            NodeList nchild = currentItem.getChildNodes();
            int clength = nchild.getLength();


            for (int j = 0; j < clength; j++) {

                currentChild = nchild.item(j);
                String theString = null;
                String nodeName = currentChild.getNodeName();

                theString = nchild.item(j).getFirstChild().getNodeValue();

                if (theString != null) {
                    if ("title".equals(nodeName)) {

                        _item.setTitle(theString);
                    }

                    else if ("description".equals(nodeName)) {

                        _item.setDescription(theString);

                        // Parse the html description to get the image url
                        String html = theString;
                        org.jsoup.nodes.Document docHtml = Jsoup
                                .parse(html);
                        Elements imgEle = docHtml.select("img");
                        _item.setImage(imgEle.attr("src"));
                    }

                    else if ("pubDate".equals(nodeName)) {


                        String formatedDate = theString.replace(" +0000",
                                "");
                        _item.setDate(formatedDate);
                    }

                }
            }


            _feed.addItem(_item);
        }

    } catch (Exception e) {
    }


    return _feed;
}
}     

除了我试图通过jsoup获取的图像外,一切正常。

任何人都能说出我做错了什么或错过了吗?

1 个答案:

答案 0 :(得分:0)

变量theString在传递给Jsoup之前需要转义。

else if ("description".equals(nodeName)) {
    _item.setDescription(theString);

    // Unescape then Parse the html description to get the image url
    Element imgEle = Jsoup.parse( //
            Parser.unescapeEntities( //
                  Parser.xmlParser().parseInput(theString, "").outerHtml(), //
                  true //
            )) //
            .select("img").first();

    if (imgEle != null) {
        _item.setImage(imgEle.attr("src"));
    }
}