从java中的xml中提取文本

时间:2012-10-25 13:52:50

标签: java android xml parsing sax

我正在尝试为Android创建一个rss阅读器。所以我连接到rss ulr并获取一些XML信息。这是链接 - http://www.bulgarianhistory.org/feed/ 如果您打开它以查看来源,您会注意到有一个标记内容:已编码。如何获取此标记内的信息?我的代码只是跳过它!我正在使用SAX。 这是我的Parser课程:

public class RSSParser extends DefaultHandler {
    private final static String TAG_ITEM = "item";
    private final static String[] xmltags = { "title", "link", "pubDate", "description", "content" };

    private RSSItem currentitem = null;
    private ArrayList<RSSItem> itemarray = null;
    private int currentindex = -1;
    private boolean isParsing = false;
    private StringBuilder builder = new StringBuilder();

    public RSSParser(ArrayList<RSSItem> itemarray) {
        super();

        this.itemarray = itemarray;
    }

    @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            super.characters(ch, start, length);

            if(isParsing && -1 != currentindex && null != builder)
            {
                builder.append(ch,start,length);
            }
        }

    @Override
        public void startElement(String uri, String localName, String qName,Attributes attributes) throws SAXException {
            super.startElement(uri, localName, qName, attributes);

            if(localName.equalsIgnoreCase(TAG_ITEM))
            {
                currentitem = new RSSItem();
                currentindex = -1;
                isParsing = true;

                itemarray.add(currentitem);
            }
            else
            {
                currentindex = itemIndexFromString(localName);

                builder = null;

                if(-1 != currentindex)
                    builder = new StringBuilder();
            }
        }

    @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            super.endElement(uri, localName, qName);

            if(localName.equalsIgnoreCase(TAG_ITEM))
            {
                isParsing = false;
            }
            else if(currentindex != -1)
            {
                if(isParsing)
                {
                    switch(currentindex)
                    {
                        case 0: currentitem.title = builder.toString();                 break;
                        case 1: currentitem.link = builder.toString();                  break;
                        case 2: currentitem.date = builder.toString();                  break;
                        case 3: currentitem.description= builder.toString();            break;
                        case 4: currentitem.content = builder.toString();               break;
                    }
                }
            }
        }

    private int itemIndexFromString(String tagname){
        int itemindex = -1;

        for(int index= 0; index<xmltags.length; ++index)
        {
            if(tagname.equalsIgnoreCase(xmltags[index]))
            {
                itemindex = index;

                break;
            }
        }

        return itemindex;
    }
}

2 个答案:

答案 0 :(得分:1)

我真的建议不要在Android上使用Sax,因为据我记得它在Android上的表现非常慢。您应该使用Android附带的XmlPullParser,因此无需添加任何外部jars,您也将获得显着的性能提升。我刚刚测试过,XmlPullParser会将<content:encoded>标记解析为任何其他标记,因此可以正常使用。

答案 1 :(得分:1)

您的if会忽略带有命名空间的标记。只需尝试在此处打印标签名称

 private int itemIndexFromString(String tagname){
        int itemindex = -1;

        for(int index= 0; index<xmltags.length; ++index)
        {
            **PRINT (tagname)**
            if(tagname.equalsIgnoreCase(xmltags[index]))
            {

如果打印为content:encoded,则需要修改xmltags []以包含content:encoded而不是content