我想在Andriod应用程序中使用SAX解析器读取wordpress网站生成的一些提要。我没有在CDATA部分获得全文,文本在中间某处被分割。我读到SAX解析器有时会以块的形式分割文本,但是如何在阅读器中合并所有这些内容?任何帮助都会优雅!这是我的RSSHandler.java和来自wordpress rss的xml的一部分:
RSSHandler.java:
package com.example.vlada.vlada;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class RSSHandler extends DefaultHandler {
final int state_unknown = 0;
final int state_title = 1;
final int state_sadrzaj_posta = 2;
final int state_link = 3;
final int state_pubdate = 4;
int currentState = state_unknown;
RSSFeed feed;
RSSItem item;
boolean itemFound = false;
RSSHandler(){
}
RSSFeed getFeed(){
return feed;
}
@Override
public void startDocument() throws SAXException {
// TODO Auto-generated method stub
feed = new RSSFeed();
item = new RSSItem();
}
@Override
public void endDocument() throws SAXException {
// TODO Auto-generated method stub
}
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// TODO Auto-generated method stub
if (localName.equalsIgnoreCase("item")){
itemFound = true;
item = new RSSItem();
currentState = state_unknown;
}
else if (localName.equalsIgnoreCase("title")){
currentState = state_title;
}
else if (localName.equalsIgnoreCase("sadrzaj_posta")){
currentState = state_sadrzaj_posta;
}
else if (localName.equalsIgnoreCase("link")){
currentState = state_link;
}
else if (localName.equalsIgnoreCase("pubdate")){
currentState = state_pubdate;
}
else{
currentState = state_unknown;
}
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
// TODO Auto-generated method stub
if (localName.equalsIgnoreCase("item")){
feed.addItem(item);
}
}
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
// TODO Auto-generated method stub
String strCharacters = new String(ch,start,length);
if (itemFound==true){
// "item" tag found, it's item's parameter
switch(currentState){
case state_title:
item.setTitle(strCharacters);
break;
case state_sadrzaj_posta:
item.setSadrzaj_posta(strCharacters);
break;
case state_link:
item.setLink(strCharacters);
break;
case state_pubdate:
item.setPubdate(strCharacters);
break;
default:
break;
}
}
else{
// not "item" tag found, it's feed's parameter
switch(currentState){
case state_title:
feed.setTitle(strCharacters);
break;
case state_sadrzaj_posta:
feed.setSadrzajPosta(strCharacters);
break;
case state_link:
feed.setLink(strCharacters);
break;
case state_pubdate:
feed.setPubdate(strCharacters);
break;
default:
break;
}
}
currentState = state_unknown;
}
}
xml的一部分:
<sadrzaj_posta>
<![CDATA[
<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
]]>
</sadrzaj_posta>
我刚才:
Lorem Ipsum只是印刷和排版行业的虚拟文本。自16世纪以来,Lorem Ipsum一直是业界标准的虚拟文本,当时一台未知的打印机占用了厨房
答案 0 :(得分:0)
characters()
可以在元素中多次调用,以构建文本节点的整个文本。请勿尝试使用characters()
中的文字。相反,将它们附加到StringBuilder
或其他内容,然后在endElement()
或类似的点处处理结果,您知道自己拥有所有文本。