xQuery更改节点层次结构(从一个节点中删除子节点并将其作为兄弟节点返回)

时间:2014-06-16 00:31:52

标签: xml xpath recursion xml-parsing xquery

我有一个xml文档,如下所示:

<dict>
    <word>
        <sense>
            <definition> This is the text of the definition. 
                <example>
                    <quote>This is the text of an example.</quote>
                </example>
                <source>
                    <place>This is the name of the place recorded</place>
                </source>. 
            </definition>
        </sense>
    </word>
</dict>

我需要使用xQuery来转换它,使<example>及其子项成为<definition>的兄弟,而<source>及其子项应成为<example>的子项{1}}。换句话说,我需要这个作为输出:

<word>
    <sense>
        <definition> This is the text of the definition. </definition>
        <example>
            <quote>This is the text of an example.</quote>
            <source>
                <place>This is the name of the place recorded.</place>
            </source>
        </example>
    </sense>
</word>

正如您所看到的,在<source>关闭之前需要成为最后一个字符串的原始<place>元素之后还存在一个句号问题。

我创建了一个xQuery文件并找出了如何从层次结构中删除元素,但是我在递归处理节点和在同一个函数中添加新元素时遇到了麻烦。

xquery version "3.0";
declare namespace saxon="http://saxon.sf.net/";
declare option saxon:output "indent=yes";
declare option saxon:output "saxon:indent-spaces=3";


declare function local:test($node as item()*) as item()* {
    typeswitch($node)
        case text() return normalize-space($node)
        case element(word) return <word>{local:recurse($node)}</word>
        case element(dict) return <dict>{local:recurse($node)}</dict>
        case element(sense) return <sense>{local:recurse($node)}</sense>
        case element(definition) return local:definition($node)
        case element(example) return local:example($node)
        case element(source) return local:source($node)
        case element(place) return <place>{local:recurse($node)}</place>
        default return local:recurse($node)
};

declare function local:definition($nodes as item()*) as item()*{

(: here I need to process children of definition - except <source> and its
children will become children of <example>; and <example> should be returned 
as a next sibling of definition. THIS IS THE PART THAT I DON'T KNOW HOW TO DO :)

<definition>
{
 for $node in $nodes/node()
    return
        local:test($node)
}
</definition>

};

declare function local:example($node as item()*) as item()* {
(: here i am removing <example> because I don't want it to be a child
of <definition> any more. THIS BIT WORKS AS IT SHOULD :)

if ($node/parent::definition) then ()
   else <example>{$node/@*}{local:recurse($node)}</example>
};

declare function local:source($node as item()*) as item()* {
(: here i am removing <source> because I don't want it to be a child
of <definition> any more.  :)

if ($node/parent::definition) then ()
   else <example>{$node/@*}{local:recurse($node)}</example>
};


declare function local:recurse($nodes as item()*) as item()* {
    for $node in $nodes/node()
    return
        local:test($node)
};


local:test(doc("file:test.xml"))

这不应该是一件非常困难的事情,但我对xQuery如何处理这类问题存在概念上的困难。我非常感谢你的帮助。

XSLT不是一个选项。

2 个答案:

答案 0 :(得分:1)

我会选择 XQuery更新,这也得到Saxon的支持,并且会让这更容易。这会复制输入文件,但只需稍加修改,您也可以直接更改原始文档。

(: Copy the input file :)
copy $result := doc("file:test.xml")
modify (
  for $definition in $result//definition
  return (
    (: Create new example element, and add it after the definition :)
    insert node element example {
      $definition/example/quote,
      $definition/source
    } after $definition,
    (: Throw away the old elements :)
    delete nodes $definition/(example, source)
  )
)
return $result/dict/word

请注意,如果错误放置了点,这不会修复损坏的输入,但我也没有看到在您的代码中执行此操作的任何方法。

如果你更喜欢没有更新语句的版本,那么仍然不需要使用递归函数的复杂方法:

for $word in doc("file:test.xml")/dict/word
return element word {
  for $sense in $word/sense
  return element sense {
    for $definition in $sense/definition
    return (
      element definition { $definition/text() },
      element example { $definition/(example/quote, source) }
    )
  }
}

答案 1 :(得分:1)

仅仅为了完整性,这里有一个只有一个递归函数的递归XQuery 1.0 解决方案。我同意Jens的说法,给定的例子可以很容易地处理而不需要递归,但是如果真实的例子更大,并且你没有XQuery Update,你可以尝试这样的事情:

declare function local:recurse($node as item()*) as item()* {
    typeswitch($node)
        case text()
            return normalize-space($node)
        case element(definition)
            return element {node-name($node)} {
                $node/@*,
                local:recurse($node/node() except $node/(example|source))
            }
        case element(sense)
            return element {node-name($node)} {
                $node/@*,
                local:recurse($node/node()),
                <example>{
                    $node/definition/example/@*,
                    $node/definition/example/node(),
                    $node/definition/source
                }</example>
            }
        case element()
            return element {node-name($node)} {
                $node/@*,
                local:recurse($node/node())
            }
        default return $node
};


let $xml :=
<dict>
    <word>
        <sense>
            <definition> This is the text of the definition. 
                <example>
                    <quote>This is the text of an example.</quote>
                </example>
                <source>
                    <place>This is the name of the place recorded.</place>
                </source>
            </definition>
        </sense>
    </word>
</dict>
return local:recurse($xml)

HTH!