如何使用XQuery以CSV格式提取多个xml元素?

时间:2018-04-10 14:31:05

标签: xml xpath xquery

我正在尝试使用字符串连接函数从XML文件中提取多个元素,该函数适用于单个元素。但是,当我尝试在我的代码中添加另一个时,我看到的数据不正确。我怀疑我在某个地方错过了一件简单的事情,但似乎无法找到它......

示例XML数据: -

<books>
  <book id="6636551">
    <master_information>
      <book_xref>
        <xref type="Fiction" type_id="1">72771KAM3</xref>
        <xref type="Non_Fiction" type_id="2">US72771KAM36</xref>
      </book_xref>
    </master_information>
    <book_details>
      <price>24.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
    </book_details>
    <global_information>
      <ratings>
        <rating agency="ABC Agency" type="Author Rating">A++</rating>
        <rating agency="DEF Agency" type="Author Rating">A+</rating>
        <rating agency="DEF Agency" type="Book Rating">A</rating>
      </ratings>
    </global_information>
    <country_info>
      <country_code>US</country_code>
    </country_info>
  </book>
  <book id="119818569">
    <master_information>
      <book_xref>
        <xref type="Fiction" type_id="1">070185UL5</xref>
        <xref type="Non_Fiction" type_id="2">US070185UL50</xref>
      </book_xref>
    </master_information>
    <book_details>
      <price>19.25</price>
      <publish_date>2002-11-01</publish_date>
      <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
    </book_details>
    <global_information>
      <ratings>
        <rating agency="ABC Agency" type="Author Rating">A+</rating>
        <rating agency="ABC Agency" type="Book Rating">A</rating>
        <rating agency="DEF Agency" type="Author Rating">A</rating>
        <rating agency="DEF Agency" type="Book Rating">B+</rating>
      </ratings>
    </global_information>
    <country_info>
      <country_code>CA</country_code>
    </country_info>
  </book>
  </book>
</books>

XQuery用于提取单个元素: -

for $x in string-join(('book_id,book_price', //book/book_details/price/string-join((ancestor::book/@id, .), ',')), '&#10;')
return $x

哪个工作正常,并按如下方式吐出样本输出:

book_id,book_price
6636551,24.95
119818569,19.25

问题是如何从单个XML文件中提取多个元素或元素和属性的组合,仍然可能使用字符串连接?

我尝试使用以下内容(大部分都可以正常工作)但我注意到,对于更大的数据集,值似乎随机填充错误的列。例如。在下面的代码中,如果数据中./publish_date为空,我注意到./description列中会填充./publish_date数据。

for $x in string-join(('book_id,book_price,book_pub_date,book_desc', //book/book_details/string-join((ancestor::book/@id, ./price, ./publish_date, ./description), ',')), '&#10;')
return $x

仅供参考,我还在学习XQuery。感谢您的见解/意见/帮助!

1 个答案:

答案 0 :(得分:4)

XQuery中的序列展平:表达式(1, (2, 3), ((4)), (), 5)(1, 2, 3, 4, 5)是等效的。这意味着如果某些XPath子查询没有返回任何结果,则序列(ancestor::book/@id, ./price, ./publish_date, ./description)的长度会有所不同。由于函数fn:string-join($strings, $sep)只是将分隔符放在$strings(展平)中的每对相邻项之间,因此结果字符串中可以包含不同数量的逗号。

为了保留CSV表的对齐方式,只要缺少值,就可以插入空字符串。一种简单的方法是使用展平优势:($possibly-empty, '')[1]

  • 如果$possibly-empty包含一个项目(例如'foo'),那么此评估结果为('foo', '')[1] - &gt; 'foo'
  • 如果是空序列(),则表达式的计算结果为((), '')[1] - &gt; ('')[1](展平) - &gt; ''

工作示例(您的封闭FLWOR表达式(for / return)完全是多余的,因为您只迭代单个字符串元素,因此我省略了它:)

string-join(
  (
    'book_id,book_price,book_pub_date,book_desc',
    //book/book_details/string-join(
      (
        (ancestor::book/@id, '')[1],
        (./price, '')[1],
        (./publish_date, '')[1],
        (./description, '')[1]
      ),
      ','
    )
  ),
  '&#10;'
)

您还可以将该功能抽象为其自己的功能:

declare function local:non-empty($possibly-empty) {
  ($possibly-empty, '')[1]
};

string-join(
  (
    'book_id,book_price,book_pub_date,book_desc',
    //book/book_details/string-join(
      (
        local:non-empty(ancestor::book/@id),
        local:non-empty(./price),
        local:non-empty(./publish_date),
        local:non-empty(./description)
      ),
      ','
    )
  ),
  '&#10;'
)