有人可以解释我的XML解析器跳过这些HTML元素的原因吗?

时间:2014-12-09 06:45:06

标签: html c++ css xml tree

我正在使用我在互联网上找到的XML解析器:http://www.applied-mathematics.net/tools/xmlParser.html。我的意图是在一个读取HTML文件的C++程序中使用它。

注意:我不希望您阅读相关文档或理解它。

我想知道的是,根据您对XMLHTML的了解,您是否可能会怀疑它为何计算以下外部div的子元素成为其中的a标签。出于某种原因,它正在跳过ulli元素。

  <div id="navrow1" class="tabs">
    <ul class="tablist">
      <li class="current"><a href="index.html"><span>Main&#160;Page</span></a></li>
      <li><a href="modules.html"><span>Modules</span></a></li>
      <li><a href="annotated.html"><span>Classes</span></a></li>
      <li><a href="files.html"><span>Files</span></a></li>
      <li>
        <div id="MSearchBox" class="MSearchBoxInactive">
        <span class="left">
          <img id="MSearchSelect" src="search/mag_sel.png"
               onmouseover="return searchBox.OnSearchSelectShow()"
               onmouseout="return searchBox.OnSearchSelectHide()"
               alt=""/>
          <input type="text" id="MSearchField" value="Search" accesskey="S"
               onfocus="searchBox.OnSearchFieldFocus(true)" 
               onblur="searchBox.OnSearchFieldFocus(false)" 
               onkeyup="searchBox.OnSearchFieldChange(event)"/>
          </span><span class="right">
            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
          </span>
        </div>
      </li>
    </ul>
  </div>
</div><!-- top -->
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
     onmouseover="return searchBox.OnSearchSelectShow()"
     onmouseout="return searchBox.OnSearchSelectHide()"
     onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark">&#160;</span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark">&#160;</span>Classes</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark">&#160;</span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark">&#160;</span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark">&#160;</span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark">&#160;</span>Typedefs</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark">&#160;</span>Enumerations</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark">&#160;</span>Enumerator</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(8)"><span class="SelectionMark">&#160;</span>Macros</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(9)"><span class="SelectionMark">&#160;</span>Groups</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(10)"><span class="SelectionMark">&#160;</span>Pages</a></div>

<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0" 
        name="MSearchResults" id="MSearchResults">
</iframe>
</div>

<div class="header">
  <div class="headertitle">
<div class="title">XMLParser library </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><h1><a class="anchor" id="intro_sec"></a>
Introduction</h1>
<p>This is a basic XML parser written in ANSI C++ for portability. It works by using recursion and a node tree for breaking down the elements of an XML document.</p>
<dl class="section version"><dt>Version</dt><dd>V2.44 </dd></dl>
<dl class="section author"><dt>Author</dt><dd>Frank Vanden Berghen</dd></dl>
<p>Copyright (c) 2002, Frank Vanden Berghen - All rights reserved.<br/>
 Commercialized by <a href="http://www.Business-Insight.com">Business-Insight</a><br/>
 See the file <a href="../../AFPL-license.txt">AFPL-license.txt</a> about the licensing terms</p>
<h1><a class="anchor" id="tutorial"></a>
First Tutorial</h1>
<p>You can follow a simple <a href="../../xmlParser.html">Tutorial</a> to know the basics...</p>
<h1><a class="anchor" id="usage"></a>
General usage: How to include the XMLParser library inside your project.</h1>
<p>The library is composed of two files: <a href="../../xmlParser.cpp">xmlParser.cpp</a> and <a href="../../xmlParser.h">xmlParser.h</a>. These are the ONLY 2 files that you need when using the library inside your own projects.</p>
<p>All the functions of the library are documented inside the comments of the file <a href="../../xmlParser.h">xmlParser.h</a>. These comments can be transformed in full-fledged HTML documentation using the DOXYGEN software: simply type: "doxygen doxy.cfg"</p>
<p>By default, the XMLParser library uses (char*) for string representation.To use the (wchar_t*) version of the library, you need to define the "_UNICODE" preprocessor definition variable (this is usually done inside your project definition file) (This is done automatically for you when using Visual Studio).</p>
<h1><a class="anchor" id="example"></a>
Advanced Tutorial and Many Examples of usage.</h1>
<p>Some very small introductory examples are described inside the Tutorial file <a href="../../xmlParser.html">xmlParser.html</a></p>
<p>Some additional small examples are also inside the file <a href="../../xmlTest.cpp">xmlTest.cpp</a> (for the "char*" version of the library) and inside the file <a href="../../xmlTestUnicode.cpp">xmlTestUnicode.cpp</a> (for the "wchar_t*" version of the library). If you have a question, please review these additionnal examples before sending an e-mail to the author.</p>
<p>To build the examples:</p>
<ul>
<li>linux/unix: type "make"</li>
<li>solaris: type "make -f makefile.solaris"</li>
<li>windows: Visual Studio: double-click on xmlParser.dsw (under Visual Studio .NET, the .dsp and .dsw files will be automatically converted to .vcproj and .sln files)</li>
</ul>
<p>In order to build the examples you need some additional files:</p>
<ul>
<li>linux/unix: makefile</li>
<li>solaris: makefile.solaris</li>
<li>windows: Visual Studio: *.dsp, xmlParser.dsw and also xmlParser.lib and xmlParser.dll</li>
</ul>
<h1><a class="anchor" id="debugging"></a>
Debugging with the XMLParser library</h1>
<h2><a class="anchor" id="debugwin"></a>
Debugging under WINDOWS</h2>
<p>Inside Visual C++, the "debug versions" of the memory allocation functions are very slow: Do not forget to compile in "release mode" to get maximum speed. When I had to debug a software that was using the XMLParser Library, it was usually a nightmare because the library was sooOOOoooo slow in debug mode (because of the slow memory allocations in Debug mode). To solve this problem, during all the debugging session, I am now using a very fast DLL version of the XMLParser Library (the DLL is compiled in release mode). Using the DLL version of the XMLParser Library allows me to have lightening XML parsing speed even in debug! Other than that, the DLL version is useless: In the release version of my tool, I always use the normal, ".cpp"-based, XMLParser Library (I simply include the <a href="../../xmlParser.cpp">xmlParser.cpp</a> and <a href="../../xmlParser.h">xmlParser.h</a> files into the project).</p>
<p>The file <a href="../../XMLNodeAutoexp.txt">XMLNodeAutoexp.txt</a> contains some "tweaks" that improve substancially the display of the content of the <a class="el" href="structXMLNode.html" title="Main Class representing a XML node.">XMLNode</a> objects inside the Visual Studio Debugger. Believe me, once you have seen inside the debugger the "smooth" display of the <a class="el" href="structXMLNode.html" title="Main Class representing a XML node.">XMLNode</a> objects, you cannot live without it anymore!</p>
<h2><a class="anchor" id="debuglinux"></a>
Debugging under LINUX/UNIX</h2>
<p>The speed of the debug version of the XMLParser library is tolerable so no extra work.has been done. </p>
</div></div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated on Thu May 30 2013 23:07:18 for xmlParser by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.3.1
</small></address>

当我运行以下递归过程时

void tree2CSS(XMLNode & thisNode, unsigned depth, std::string thisLine, std::string & accumCSS)
{
    int i = 1;
    for (XMLNode childNode = thisNode.getChildNode(i); !childNode.isEmpty(); childNode = thisNode.getChildNode(++i))
    {
        std::string newLine;
        std::string tabs(depth, '\t');
        newLine.append(tabs + thisLine);
        if (depth > 0) newLine.append(" > ");
        newLine.append((std::string)childNode.getName());
        accumCSS.append(newLine + "\n");
        tree2CSS(childNode, depth + 1, newLine, accumCSS);
    }
}
std::string CSS;
tree2CSS(bodyNode, 0, "", CSS);
std::cout << CSS;

打印格式化为文档树的CSS,我得

div
    div > a
    div > a
    div > a
    div > a
    div > a
    div > a
    div > a
    div > a
    div > a
    div > a
div
div
div
hr
address

似乎完全缺少很多东西。知道为什么会这样吗?

1 个答案:

答案 0 :(得分:2)

代码缺少thisNode的输出。除了检索其子项之外,您不会对thisNode执行任何操作。要让它显示所有节点,必须在for循环前加上类似

的内容
accumCSS.append(thisNode.getName());

加上适当的缩进。

由于depth在循环期间没有变化,我也会移动

std::string tabs(depth, '\t');

在循环前面。