如何使用XQuery将连续标记转换为嵌套标记或表

时间:2017-10-11 18:25:14

标签: xml xpath xquery xquery-3.0

我有一个带有连续标签的XML文件,而不是嵌套标签,如下所示:

<title>
    <subtitle>
        <topic att="TopicTitle">Topic title 1</topic>
        <content att="TopicSubtitle">topic subtitle 1</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>
        <content att="TopicSubtitle">topic subtitle 2</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>

        <topic att="TopicTitle">Topic title 2</topic>
        <content att="TopicSubtitle">topic subtitle 1</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>
        <content att="TopicSubtitle">topic subtitle 2</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>
    </subtitle>
</title>

我在BaseX中使用XQuery,我想将其转换为包含以下列的表:

Title      Subtitle      TopicTitle      TopicSubtitle      Paragraph
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 1   paragraph text 1
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 1   paragraph text 2
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 2   paragraph text 1
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 2   paragraph text 2
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 1   paragraph text 1
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 1   paragraph text 2
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 2   paragraph text 1
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 2   paragraph text 2

我是XQuery和XPath的新手,但我已经了解了如何浏览节点并选择我需要的基础知识。我还不知道如何使用我想要转换为嵌套XML或表格的连续数据(CSV?)。有人可以帮忙吗?

2 个答案:

答案 0 :(得分:5)

您可以使用tumbling windowhttps://www.w3.org/TR/xquery-30/#id-windows)将扁平XML转换为嵌套XML,例如

for tumbling window $w in title/subtitle/*
    start $t when $t instance of element(topic)
return
    <topic
        title="{$t/@att}">
        {
            for tumbling window $content in tail($w)
                start $c when $c/@att = 'TopicSubtitle'
            return
                <subtopic
                    title="{$c/@att}">
                    {
                        tail($content) ! <para>{node()}</para>
                    }
                </subtopic>
        }
    </topic>

给出

<topic title="TopicTitle">
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
</topic><topic title="TopicTitle">
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
</topic>

基于此,我认为您可以使用

将整个转换为以分号分隔的数据
string-join(
<title>
    <subtitle>
        {
            for tumbling window $w in title/subtitle/*
                start $t when $t instance of element(topic)
            return
                <topic
                    title="{$t/@att}"
                    value="{$t}">
                    {
                        for tumbling window $content in tail($w)
                            start $c when $c/@att = 'TopicSubtitle'
                        return
                            <subtopic
                                title="{$c/@att}"
                                value="{$c}">
                                {
                                    tail($content) ! <para>{node()}</para>
                                }
                            </subtopic>
                    }
                </topic>
        }
    </subtitle>
</title>//para ! string-join(ancestor-or-self::* ! (text(), @value, 'Irrelevant')[1], ';'), '&#10;')

答案 1 :(得分:1)

虽然位置分组是解决此类问题的最常用方法(即XQuery 3.0 +中的窗口翻滚,XSLT 2.0+中的for-each-group/@group-starting-with,如Martin Honnen所述)我不认为它是严格的因为你实际上并没有试图利用数据中隐含的层次结构。

具体来说,您将一个具有隐式层次结构的平面结构转换为另一个具有隐式层次结构的平面结构,您可以使用以下行中的某些内容进行转换:

<table>{
    for $para in title/subtitle/content[@att='paragraph']
    return <row>
      <cell>irrelevant</cell>
      <cell>irrelevant</cell>
      <cell>{$para/preceding-sibling::topic[1]/string()}</cell>
      <cell>{$para/preceding-sibling::content[@att='TopicSubtitle'][1]/string()}</cell>
      <cell>{$para/string()}</cell>
    </row>
}</table>