使用XQuery进行XML到CSV的转换

时间:2012-12-30 06:46:37

标签: xml xquery marklogic xpath-2.0 altova

我有一个XML文件,我需要将其转换为XQuery。考虑一组简单的XML:

books[book]
book[@isbn, title, descrption]

例如:

<books>
    <book isbn="1590593049">
        <title>Extending Flash MX 2004</title>
        <description>
        Using javascript alongwith actionscript 3.0 and mxml.</description>
    </book>
    <book isbn="0132149184">
        <title>Java Software Solutions</title>
        <description>
            Complete book full of case studies on business solutions and design concepts while building mission critical
            business applications.
        </description>
    </book>

如何使用XQuery将其转换为CSV格式? CSV由Microsoft Excel使用,

因此它将以逗号(,)字符分隔,并且应转义特殊字符。

2 个答案:

答案 0 :(得分:4)

假设您的xml位于变量$books中,您可以使用以下内容在新行上创建每个书籍节点的csv文件:

declare function local:my-replace($input) {
  for $i in $input
  return '"' || replace($i, '"', '""') || '"'
};
for $book in $books//book
return string-join(local:my-replace(($book/@isbn, $book/title, $book/description)), ",") || '&#xa;'

string-join连接不同的字符串,本地函数my-replace根据您的规范替换序列中的值。

答案 1 :(得分:4)

纯XPath 2.0表达式

for $b in /*/book
    return
      concat(escape-html-uri(string-join(($b/@isbn,
                                          $b/title,
                                          $b/description
                                          )
                                           /normalize-space(),
                                        ",")
                             ),
             codepoints-to-string(10))

基于XSLT 2的验证:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:sequence select=
   "for $b in /*/book
       return
         concat(escape-html-uri(string-join(($b/@isbn,
                                             $b/title,
                                             $b/description
                                             )
                                              /normalize-space(),
                                           ',')
                                ),
                codepoints-to-string(10))"/>
 </xsl:template>
</xsl:stylesheet>

对提供的XML文档应用此转换(根据其格式错误进行更正):

<books>
    <book isbn="1590593049">
        <title>Extending Flash MX 2004</title>
        <description>
        Using javascript alongwith actionscript 3.0 and mxml.</description>
    </book>
    <book isbn="0132149184">
        <title>Java Software Solutions</title>
        <description>
            Complete book full of case studies on business solutions and design concepts while building mission critical
            business applications.
        </description>
    </book>
</books>

产生了想要的正确结果

1590593049,Extending Flash MX 2004,Using javascript alongwith actionscript 3.0 and mxml.
 0132149184,Java Software Solutions,Complete book full of case studies on business solutions and design concepts while building mission critical business applications.

<强>更新

在评论中OP已经要求任何文本逗号被引号包围,并且(之后)任何引号被两个引号替换,最后,如果wholw结果包含引号,则必须是被(单)引号包围。

这是一个纯XPath 2.0表达式,它产生了这个:

for $b in /*/book,
    $q in codepoints-to-string(34),
    $NL in codepoints-to-string(10),
    $isbn in normalize-space(replace($b/@isbn, ',', concat($q,',',$q))),
    $t in normalize-space(replace($b/title, ',', concat($q,',',$q))),
    $d in normalize-space(replace($b/description, ',', concat($q,',',$q))),
    $res in
     escape-html-uri(string-join(($isbn,$t,$d), ',')),
    $res2 in replace($res, $q, concat($q,$q))
   return
    if(contains($res2, $q))
       then concat($q, $res2, $q, $NL)
       else concat($res2, $NL)

当针对此评估此XPath表达式(使用新的测试用例扩展)XML文档时:

<books>
    <book isbn="1590593049">
        <title>Extending Flash MX 2004</title>
        <description>
        Using javascript alongwith actionscript 3.0 and mxml.</description>
    </book>
    <book isbn="0132149184">
        <title>Java Software Solutions</title>
        <description>
            Complete book full of case studies on business solutions and design concepts while building mission critical
            business applications.
        </description>
    </book>
    <book isbn="XX1234567">
        <title>Quotes and comma</title>
        <description>
            Hello, World from "Ms-Excel"
        </description>
    </book>
</books>

产生了想要的正确结果:

1590593049,Extending Flash MX 2004,Using javascript alongwith actionscript 3.0 and mxml.
0132149184,Java Software Solutions,Complete book full of case studies on business solutions and design concepts while building mission critical business applications.
"XX1234567,Quotes and comma,Hello"","" World from ""Ms-Excel"""