TinyXML2 C ++ - 从旧的/格式不正确的XML文件中提取特定数据

时间:2018-02-02 14:55:50

标签: c++ xml parsing tinyxml tinyxml2

我希望在相当陈旧的XML块中搜索(1999年的文档),并且我在使TinyXML2按预期运行时遇到一些困难。我可以抓住某些片段但是当另一个片段中有元素时我会遇到问题。拿这个样本:

var destination = compare(obj1,obj2);

这是我写的:

  <SUBJECT><TITLE>Mathematics</TITLE></SUBJECT>
     <AREA><TITLE>Arithmetic</TITLE></AREA>
     <SECTION><TITLE>Whole Numbers</TITLE></SECTION> 
        <TOPIC GRADELEVEL="4"><TITLE>Introduction to Numbers</TITLE></TOPIC> 
          <DESCRIPTION><TITLE>Description</TITLE></DESCRIPTION>  
             <FIELDSPACE>
                <PARA>To represent each conceivable number by means of a separate
                  little picture or number symbol is impossible. Therefore the civilizations of
                  the past all developed a certain pattern whereby they could write down numbers,
                  by making use of a small number of symbols. </PARA>
             </FIELDSPACE> 
             <FIELDSPACE>
                <PARA>Today, we use the Hindu-Arabic system, which first of all is
                  decimal, because we make use of only 10 different symbols, namely,</PARA>
                <LITERALLAYOUT>     0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.</LITERALLAYOUT>
             </FIELDSPACE>
             <FIELDSPACE>
                <PARA>Secondly, a place value applies. This means that if only 1
                  digit is written down then it is that number, such as a 3, a 6, or an 8.</PARA>
             </FIELDSPACE>
             <FIELDSPACE>
                <PARA>Thirdly, only the addition principle is built into our number
                  symbols.</PARA>
                <PARA>In other words,</PARA>
                <LITERALLAYOUT>     135 means 100 + 300 + 5</LITERALLAYOUT>
                <LITERALLAYOUT>     6.3 means 6 + three tenths = 6 + <EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq.png" />
</EQUATION></LITERALLAYOUT>
                <LITERALLAYOUT>     and two and a quarter = <EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq2.png" />
</EQUATION></LITERALLAYOUT>
                <PARA>means</PARA>
                <LITERALLAYOUT>     two plus a quarter = <EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq3.png" />
</EQUATION></LITERALLAYOUT>
             </FIELDSPACE>

我需要通用的代码而不是特定于此解决方案的代码 - 有数百个XML文件,我需要编写一些可以解析所有这些文件的东西。我如何在LITERALLAYOUT / EQUATION / INLINEGRAPHIC中获取信息?

提前致谢!

2 个答案:

答案 0 :(得分:0)

EQUATION此处没有字符串值。它不包含标记中的任何文本。所以你不会得到任何回报。您需要查看EQUATION元素上的属性,例如ig->attribute("FILEREF"),其中ig是指向表示INLINEGRAPHIC元素的结构的指针。

答案 1 :(得分:0)

只是建立在之前的答案上。这就是你所拥有的:

<LITERALLAYOUT>xxxxxxxxx
    <EQUATION>
        <INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq.png" />
    </EQUATION>
</LITERALLAYOUT>

这里有两件事。当您到达LITERALLAYOUT时,您可以使用GetText,这将返回xxxxxxxxx

但是你有一个选择。如果您希望它是通用的,则必须迭代LITERALLAYOUT指针的所有子元素。如果您不想这样做,那么您必须提取第一个孩子,例如:

XMLElement *pLITERALLAYOUT = xxxx; // You get this pointer.

XMLElement *pEQUATION = pLITERALLAYOUT->FirstChildElement("EQUATION");
if (pEQUATION != nullptr)
{
    // Now get the INLINEGRAPHIC element
    XMLElement *pINLINEGRAPHIC = pEQUATION->FirstChildElement("INLINEGRAPHIC");

   if (pINLINEGRAPHIC != nullptr)
   {
       const char * FILEREF;
       FILEREF = pINLINEGRAPHIC ->Attribute("FILEREF");
   }
}

请参阅?您必须知道导航XML文件的正确方法。