在xml中获取子节点的名称将返回#text c ++

时间:2017-08-22 19:05:37

标签: c++ visual-c++ msxml msxml6

我正在尝试检索xml文档中子节点标记的名称。我的xml文档看起来像这样:

<?xml version="1.0" encoding="utf-8"?>
<Parent>
  <child1>
    <grandchild1>someinfo1</grandchild1>
    <grandchild2>someinfo2</grandchild2>
  </child1>
  <child2>
    <grandchild3>someinfo3</grandchild3>
    <grandchild4>someinfo4</grandchild4>
  </child2>
</Parent>

我需要循环并找到像child1 grandchild1等的标签名称。

我执行以下操作的代码如下:

IXMLDOMDocument *pXMLDom = NULL;
IXMLDOMNodeList *pNodes = NULL;
IXMLDOMNode *pNode = NULL;

pXMLDom->put_async(VARIANT_FALSE);
pXMLDom->put_validateOnParse(VARIANT_TRUE);
pXMLDom->put_resolveExternals(VARIANT_FALSE);
pXMLDom->put_preserveWhiteSpace(VARIANT_TRUE);

BSTR parentNode = SysAllocString(L"//Parent/*");

pXMLDom->selectNodes(parentNode, &pNodes); 
pNodes->get_length(&length);

for (int i = 0; i < length; i++)
{
    pNodes->get_item(i, &pNode);
    BSTR temp = NULL;
    pNode->get_xml(&temp);
    printf("Node (%d), <%S>:\n", i, temp); // works fine until this point

    IXMLDOMNode *firstChild;
    pNode->get_firstChild(&firstChild);

    IXMLDOMNodeList *childNodes;
    pNode->get_childNodes(&childNodes);

    firstChild->get_nodeName(&temp); // Does not work
    firstChild->get_baseName(&temp); // Does not work
}

请注意,为了简单起见,我只提供了极简主义版本的代码。如果需要任何其他说明或代码,我将很乐意提供。任何正确方向的指针都会有所帮助。大部分代码都是在msdn。

的帮助下编写的

2 个答案:

答案 0 :(得分:0)

在我发布问题后,我得到了我想要的东西!

不保留空格:

pXMLDom->put_preserveWhiteSpace(VARIANT_FALSE);

答案 1 :(得分:0)

XML由节点组成,有许多不同类型的节点(元素,属性,文本,命名空间,处理指令,注释,文档等)。

包含文本内容的XML元素节点将具有名为#text的子节点。这由XML规范决定。因此,在您的示例中,grandchild1grandchild2grandchild3grandchild4都有一个子#text节点,例如:

Document
|
|_ PI: <?xml version="1.0" encoding="utf-8"?>
|
|_ Element: "Parent"
    |
    |_ Element: "child1"
    |   |
    |   |_ Element: "grandchild1"
    |   |   |
    |   |   |_ #text "someinfo1"
    |   |
    |   |_ Element: "grandchild2"
    |       |
    |       |_ #text "someinfo2"
    |
    |_ Element: "child2"
        |
        |_ Element: "grandchild3"
        |    |
        |    |_ #text: "someinfo3"
        |
        |_ Element: "grandchild4"
            |
            |_ #text: "someinfo4"

即使只是换行符,也会将元素之间的空格存储为额外的文本节点(因为您将preserveWhiteSpace选项设置为true),例如:

Document
|
|_ PI: <?xml version="1.0" encoding="utf-8"?>
|
|_ #text "\r\n"
|
|_ Element: "Parent"
    |
    |_ #text "\r\n  "
    |
    |_ Element: "child1"
    |   |
    |   |_ #text "\r\n    "
    |   |
    |   |_ Element: "grandchild1"
    |   |   |
    |   |   |_ #text "someinfo1"
    |   |
    |   |_ #text "\r\n    "
    |   |
    |   |_ Element: "grandchild2"
    |       |
    |       |_ #text "someinfo2"
    |
    |_ #text "\r\n  "
    |
    |_ Element: "child2"
    |   |
    |   |_ #text "\r\n    "
    |   |
    |   |_ Element: "grandchild3"
    |   |    |
    |   |    |_ #text: "someinfo3"
    |   |
    |   |_ #text "\r\n    "
    |   |
    |   |_ Element: "grandchild4"
    |   |   |
    |   |   |_ #text: "someinfo4"
    |   |
    |   |_ #text "\r\n  "
    |
    |_ #text "\r\n"

XPath搜索所有节点,但*通配符仅匹配元素节点。但是您手动钻取已找到元素的子元素,因此您将遇到#text个节点。对于您尝试执行的操作,请关闭空白保留以删除不需要的空白文本节点,然后仅关注元素子节点,例如:

IXMLDOMDocument *pXMLDom = NULL;
IXMLDOMNodeList *pNodes = NULL;
IXMLDOMNode *pNode = NULL;
long length = 0;

// create pXMLDom as needed ...
pXMLDom->put_async(VARIANT_FALSE);
pXMLDom->put_validateOnParse(VARIANT_TRUE);
pXMLDom->put_resolveExternals(VARIANT_FALSE);
pXMLDom->put_preserveWhiteSpace(VARIANT_FALSE); // <--

BSTR parentNode = SysAllocString(L"//Parent/*");
HRESULT hRes = pXMLDom->selectNodes(parentNode, &pNodes); 
SysFreeString(parentNode);

if (SUCCEEDED(hRes))
{
    pNodes->get_length(&length);

    for (int i = 0; i < length; ++i)
    {
        hRes = pNodes->get_item(i, &pNode);
        if (SUCCEEDED(hRes))
        {
            BSTR name = NULL;
            hRes = pNode->get_nodeName(&name);
            if (SUCCEEDED(hRes))
            {
                printf("Node (%d), <%S>:\n", i, name);
                SysFreeString(name);
            }

            IXMLDOMNode *pChild = NULL;
            hRes = pNode->get_firstChild(&pChild);
            if (hRes == S_OK)
            {
                do
                {
                    DOMNodeType type;
                    hRes = pChild->get_nodeType(&type);  
                    if ((SUCCEEDED(hRes) && (type == NODE_ELEMENT))
                    {
                        hRes = pNode->get_nodeName(&name);
                        if (SUCCEEDED(hRes))
                        {
                            printf("  %S\n", name);
                            SysFreeString(name);
                        }
                    }

                    IXMLDOMNode *pSibling = NULL;
                    hRes = pChild->get_nextSibling(&pSibling);
                    if (hRes != S_OK) break;

                    pChild->Release();
                    pChild = pSibling;
                }
                while (true);

                pChild->Release();
            }

            pNode->Release();
        }
    }

    pNodes->Release();
}

...

pXMLDom->Release();

如果你需要超过2级,你应该设置一个递归循环,例如:

void processNode(IXMLDOMNode *pNode)
{
    BSTR name = NULL;
    hRes = pNode->get_nodeName(&name);
    if (SUCCEEDED(hRes))
    {
        printf("%S\n", name);
        SysFreeString(name);
    }

    IXMLDOMNode *pChild = NULL;
    hRes = pNode->get_firstChild(&pChild);
    if (hRes == S_OK)
    {
        do
        {
            DOMNodeType type;
            hRes = pChild->get_nodeType(&type);  
            if ((SUCCEEDED(hRes) && (type == NODE_ELEMENT))
                processNode(pChild);

            IXMLDOMNode *pSibling = NULL;
            hRes = pChild->get_nextSibling(&pSibling);
            if (hRes != S_OK) break;

            pChild->Release();
            pChild = pSibling;
        }
        while (true);

        pChild->Release();
    }
}

...

IXMLDOMDocument *pXMLDom = NULL;
IXMLDOMNodeList *pNodes = NULL;
IXMLDOMNode *pNode = NULL;
long length = 0;

// create pXMLDom as needed ...
pXMLDom->put_async(VARIANT_FALSE);
pXMLDom->put_validateOnParse(VARIANT_TRUE);
pXMLDom->put_resolveExternals(VARIANT_FALSE);
pXMLDom->put_preserveWhiteSpace(VARIANT_FALSE); // <--

BSTR parentNode = SysAllocString(L"//Parent/*");
HRESULT hRes = pXMLDom->selectNodes(parentNode, &pNodes); 
SysFreeString(parentNode);

if (SUCCEEDED(hRes))
{
    pNodes->get_length(&length);

    for (int i = 0; i < length; ++i)
    {
        hRes = pNodes->get_item(i, &pNode);
        if (SUCCEEDED(hRes))
        {
            processNode(pNode);
            pNode->Release();
        }
    }

    pNodes->Release();
}

...

pXMLDom->Release();