Question

再一个问题。这次我正在解析从服务器收到的XML消息。有人认为是聪明的，并决定将HTML页面放在XML消息中。现在我有点面临问题，因为我想从这个XML消息中将该HTML页面提取为字符串。

好的，这是我正在解析的XML消息：

<AmigoRequest> <From></From> <To></To> <MessageType>showMessage</MessageType> <Param0>general message</Param0> <Param1><html><head>test</head><body>Testhtml</body></html></Param1> </AmigoRequest>

您会在Param1中看到指定了HTML页面。我试图通过以下方式提取消息：

public String getParam1(Document d) {
        if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
            NodeList results = d.getElementsByTagName("Param1");
            // Messagetype depends on what message we are reading.           
            if (results.getLength() > 0 && results != null) {                
                return results.item(0).getFirstChild().getNodeValue();
            }
        }
        return "";
    }

其中d是文档形式的XML消息。它总是返回一个空值，因为getNodeValue（）返回null。当我尝试results.item（0）.getFirstChild（）。hasChildNodes（）时，它将返回true，因为他看到消息中有一个标记。

如何从字符串中的Param0中提取html消息<html><head>test</head><body>Testhtml</body></html>？

我正在使用Android sdk 1.5（几乎是java）和DOM Parser。

感谢您的时间和回复。

ANTEK

Answer 1

您可以使用param1的内容，如下所示：

public String getParam1(Document d) {
        if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
            NodeList results = d.getElementsByTagName("Param1");
            // Messagetype depends on what message we are reading.           
            if (results.getLength() > 0 && results != null) {                

                // String extractHTMLTags(String s) is a function that you have 
                // to implement in a way that will extract all the HTML tags inside a string.
                return extractHTMLTags(results.item(0).getTextContent());
            }
        }
        return "";
    }

您所要做的就是实现一个功能：

String extractHTMLTags(String s)

将从字符串中删除所有HTML标记出现。为此，您可以查看此帖子：Remove HTML tags from a String

Answer 2

经过多次检查并摸不着头脑后，我想出了一个简单的改动，需要将你的API级别改为8

Answer 3

编辑：我刚刚看到您的评论上面关于getTextContent()在Android上不受支持。我将留下这个答案，以防它对不同平台上的人有用。

如果您的DOM API支持它，您可以调用getTextContent()，如下所示：

public String getParam1(Document d) {
        if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
            NodeList results = d.getElementsByTagName("Param1");
            // Messagetype depends on what message we are reading.           
            if (results != null) {                
                return results.getTextContent();
            }
        }
        return "";
    }

但是，getTextContent()是DOM Level 3 API调用;并非所有解析器都能保证支持它。 Xerces-J does

顺便说一下，在您的原始示例中，您对null的检查位置错误;它应该是：

        if (results != null && results.getLength() > 0) {

否则，如果results确实以[{1}}的形式返回，您将获得NPE。

Answer 4

由于您无法使用getTextContent()，因此另一种选择是写它 - 这并不难。事实上，如果你只是为了自己的使用而写这个 - 或者你的雇主对开源没有过于严格的规则 - 你可以将Apache's implementation作为起点; 610-646行似乎包含了你需要的大部分内容。（请尊重Apache的版权和许可。）

否则，该方法的一些粗略伪代码将是：

String getTextContent(Node node) {
    if (node has no children) 
        return "";

    if (node has 1 child)
        return getTextContent(node.getFirstChild());

    return getTextContent(new StringBuffer()).toString();
}

StringBuffer getTextContent(Node node, StringBuffer sb) {
    for each child of node {
        if (child is a text node) sb.append(child's text)
        else getTextContent(child, sb);
    }
    return sb;
}

Answer 5

嗯，我几乎带着代码......

public String getParam1(Document d) {
    if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
        NodeList results = d.getElementsByTagName("Param1");
        // Messagetype depends on what message we are reading.           
        if (results.getLength() > 0 && results != null) {                
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db;
            Element node = (Element) results.item(0); // get the value of Param1
            Document doc2 = null;
            try {

                db = dbf.newDocumentBuilder();
                doc2 = db.newDocument(); //create new document
                doc2.appendChild(doc2.importNode(node, true)); //import the <html>...</html> result in doc2

            } catch (ParserConfigurationException e) {
                // TODO Auto-generated catch block
                Log.d(TAG, " Exception ", e);
            } catch (DOMException e) {
                // TODO: handle exception
                Log.d(TAG, " Exception ", e);
            } catch (Exception e) {
                // TODO: handle exception
                e.printStackTrace();               }              


            return doc2. .....// All I'm missing is something to convert a Document to a string.
        }
    }
    return "";

}

就像在我的代码的评论中解释的那样。我所缺少的就是从文档中创建一个String。你不能在Android中使用Transform类... doc2.toString（）会给你一个对象的序列化..

但我的下一步是编写我自己的解析器，如果这不起作用;）

不是最好的代码，而是一个冠状的解决方案。

public String getParam1(String b) {
        return b
                .substring(b.indexOf("<Param1>") + "<Param1>".length(), b.indexOf("</Param1>"));
    }

其中String b是XML文档字符串。

Android：解析XML DOM解析器。将子节点转换为字符串

5 个答案: