我有一个HTML页面,其结构如下:
<div id="content">
<h2><span class="heading">Section A</span></h2>
<p>Content of the section</p>
<p>More content in the same section</p>
<div>We can also have divs</div>
<ul><li>And</li><li>Lists</li><li>Too</li></ul>
<h3><span class="heading">Sub-section heading</span></h3>
<p>The content here can be a mixture of divs, ps, lists, etc too</p>
<h2><span class="heading">Section B</span></h2>
<p>This is section B's content</p>
and so on
</div>
我想创建以下XML结构:
<sections>
<section>
<heading>Section A</heading>
<content>
<p>Content of the section</p>
<p>More content in the same section</p>
<div>We can also have divs</div>
<ul><li>And</li><li>Lists</li><li>Too</li></ul>
</content>
<sub-sections>
<section>
<heading>Section B</heading>
<content>
<p>This is section B's content</p>
</content>
</section>
</sub-sections>
</section>
</sections>
我遇到的困难是创建<sub-section>
标签。这是我到目前为止,但B节出现在A部分的<content>
节点内。我也得到了B部分的<section>
节点,但它没有内容。
let $content := //div[@id="content"]
let $headings := $content/(h2|h3|h4|h5|h6)[span[@class="heading"]]
return
<sections>
{
for $heading in $headings
return
<section>
<heading>{$heading/span/text()}</heading>
<content>
{
for $paragraph in $heading/following-sibling::*[preceding-sibling::h2[1] = $heading]
return
$paragraph
}
</content>
</section>
}
</sections>
提前感谢任何帮助或指示。
答案 0 :(得分:2)
我首先将数据从一个部分隔离到变量中,然后继续处理:
let $content := //div[@id="content"]
return
<sections>
{
for $heading in $content//h2[span[@class='heading'] ]
let $nextHeading := $heading/following-sibling::h2
let $sectionCntent := $heading/following-sibling::* except ($nextHeading, $nextHeading/following-sibling::*)
return
<section>
{$sectionContent}
</section>
}
</sections>
这里我只对部分进行了处理,然后您可以通过在$ sectionContent变量上再次执行类似的操作来处理子部分,除非现在您必须做一些有点怪异的选择第一位或者您部分(为另一部分做类似的事情):
$sectionContent except ($sectionContent[self::h3], $sectionContent[self::h3]/following-sibling::*)
答案 1 :(得分:2)
在 XQuery 3.0 中,您可以使用window
clauses非常优雅地对您的部分和子部分进行分组:
<sections>{
for tumbling window $section in //div[@id = 'content']/*
start $h2 when $h2 instance of element(h2)
return <section>{
<heading>{$h2//text()}</heading>,
$section/self::h3[1]/preceding-sibling::*,
<sub-sections>{
for tumbling window $sub-section in $section
start $h3 when $h3 instance of element(h3)
return <section>{
<heading>{$h3//text()}</heading>,
tail($sub-section)
}</section>
}</sub-sections>
}</section>
}</sections>