在PHP或Python中,您将如何将HTML文档从野外变为树状结构,其中节点的属性将是文档的标题,下面的段落以及节点的父级将是带有节点的节点。高阶标题。
例如;用于h1 > h2 > h3 > h4 > h5 > h6 > ul
的层次结构和以下文档:
<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<p>This is a paragraph.</p>
<p>This is another paragraph.</p>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>
<h3>Another heading 3</h3>
<h2>Another heading 2</h2>
<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
</ul>
<p>One more paragraph.</p>
输出树将类似于:
<root>
<node>
<element>This is heading 1</element>
<body></body>
<node>
<element>This is heading 2</element>
<body></body>
<node>
<element>This is heading 3</element>
<body></body>
<node>
<element>This is heading 4</element>
<body><![CDATA[
<p>This is a paragraph.</p>
<p>This is another paragraph.</p>
]] >
</body>
<node>
<element>This is heading 5</element>
<body></body>
<node>
<element>This is heading 6</element>
<body></body>
</node>
</node>
</node>
</node>
<node>
<element>Another heading 3</element>
<body></body>
</node>
<node>
<element>Another heading 2</element>
<body></body>
<node>
<element></element>
<body><![CDATA[
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
]] >
</body>
<node>
<element></element>
<body><![CDATA[
<p>One more paragraph.</p>
]] >
</body>
</node>
</node>
</node>
</node>
</node>
</root>
输出不必是XML,它可以是对象(PHP或Python),可以使用next()
,previous()
,children()
和{ {1}}