在我的项目中,我需要解析XML。 XML中的某些项目具有HTML标记。我试图删除那些标签,但我没有成功。活动中的代码是:
private NewsFeedItemList parseNewsContent() {
NewsParserHandler newsParserHandler = null;
Log.i("NewsList", "Starting to parse XML...");
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader xr = parser.getXMLReader();
newsParserHandler = new NewsParserHandler();
xr.setContentHandler(newsParserHandler);
ByteArrayInputStream is = new ByteArrayInputStream(strServerResponseMsg.getBytes());
xr.parse(new InputSource(is));
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
NewsFeedItemList itemList = newsParserHandler.getNewsList();
// checkLog(itemList);
Log.i("NewsList", "Parsing XML finished. Sending result back to caller...");
return itemList;
}
“strServerResponseMsg”包含XML信息(http://www.mania.com.my/rss/ManiaTopStoriesFeedFull.aspx?catid=146)
我会解析所有项目,但那些拥有html标签的人将无法完全解析。
这是我的解析器处理程序:
public class NewsParserHandler extends DefaultHandler {
private NewsFeedItemList newsFeedItemList;
private boolean current = false;
private String currentValue = null;
/* Because the feed has another "Title", "link" and "pubdate" name in root,
* we need to don't let to be stored in arrays. Therefore, we ignore all of
* them by incrementing count.*/
private int count = 0;
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
super.characters(ch, start, length);
if(current) {
currentValue = new String(ch, start, length);
if(currentValue==null || currentValue=="" || currentValue==" ")
currentValue = "-";
current = false;
}
}
@Override
public void startDocument() throws SAXException {
super.startDocument();
newsFeedItemList = new NewsFeedItemList();
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
super.startElement(uri, localName, qName, attributes);
current = true;
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
super.endElement(uri, localName, qName);
current = false;
if(localName.equals("title")) {
if(count >= 1)
newsFeedItemList.setTitle(currentValue);
}
if(localName.equals("description")) {
newsFeedItemList.setDescription(currentValue);
}
if(localName.equals("fullbody")) {
newsFeedItemList.setFullbody(currentValue);
}
if(localName.equals("link")) {
if(count >= 4)
newsFeedItemList.setLink(currentValue);
}
if(localName.equals("pubDate")) {
if(count >= 5)
newsFeedItemList.setPubDate(currentValue);
}
if(localName.equals("image")) {
newsFeedItemList.setImage(currentValue);
}
count++;
}
@Override
public void endDocument() throws SAXException {
super.endDocument();
}
public NewsFeedItemList getNewsList() {
return newsFeedItemList;
}
}
我尝试将currentValue = Html.fromHtml(currentValue).toString();
放在characters()方法中但没有任何效果。在发送“strServerResponseMsg”之前,我尝试将其更改为HTML,但解析器没有解析任何内容。
我找到了这些主题,但他们的解决方案并不适用于我: How to strip or escape html tags in Android Display HTML Formatted String
如果你能帮助我,我非常感激。谢谢。
答案 0 :(得分:0)
使用以下方法从currentValue变量中删除所有HTML标记。
public static String removeHtmlTag(String htmlString) {
return htmlString.replaceAll("\\<.*?\\>", "").trim();
}