Question

我正在尝试将以分页和换行为里程碑的，以分区和段落组织的xml文档转换为xml文档，该文档将页面和行包装在页面和行元素中。

为此，我尝试使用util：get-fragment-between。

首先将页面上的所有行都分成一个片段，然后将每行变成一个片段。

第一步有效，但是在第二步中，我收到了以下我不理解的错误org.exist.dom.memtree.ElementImpl cannot be cast to org.exist.dom.persistent.StoredNode。

下面是xquery文件，下面是我尝试转换的xml文件摘录。

xquery version "3.1";

let $doc := doc($docpath)

(: Build first fragment of containing only lines on page:)
let $begp-node := $doc//tei:pb[@n="15-v"]
let $endp-node := $doc//tei:pb[@n="16-r"]
let $p-fragment := util:get-fragment-between($begp-node, $endp-node, $make-fragment, $display-root-namespace)
let $p-node := util:parse($p-fragment)

(: so far so good, print out of p-node gives me an xml document with just the text on page 15-v :)

（：下一步。在这里，我尝试为新创建的页面片段中的每一行构建一个片段：）

let $lines := $p-node//tei:lb

        for $line at $pos in $lines
            let $make-fragment1 := true()
            let $display-root-namespace1 := true()
            let $beginning-node := $line
            let $ending-node := $line/following::tei:lb[1]
            let $fragment := util:get-fragment-between($beginning-node, $ending-node, $make-fragment1, $display-root-namespace1)

            let $node := util:parse($fragment)
            return $node

我希望$ node是一个仅包含行片段的新xml文档。但是相反，我得到了错误：

org.exist.dom.memtree.ElementImpl无法转换为org.exist.dom.persistent.StoredNode

以下是原始文档的摘录：

<p>
      <lb ed="#L"/>dilectio <choice>
      <orig>dependant</orig>
      <reg>dependant</reg>
    </choice> causaliter a cognitione tamen quaelibet obiecti apprehensio vel cognitio
    <lb ed="#L"/>cum voluntatis libertate sufficit dilectionem causare <g ref="#slash"/> prima
    probatur quia si non sequitur quod dilec
    <lb ed="#L"/>tio
    <lb ed="#L"/>posset poni seu elici naturaliter a voluntate seclusa omni cognitione consequens
    est falsum
    <pb ed="#L" n="15-v"/>
    <lb ed="#L" n="1"/> quia tunc voluntas posset diligere in infinitum contra <ref>
      <name ref="#Augustine">augustinum</name> in libro 8 2 10 <title ref="#deTrinitate">de
        trinitate</title>
    </ref> patet consequentia quia positis omnibus causis ad productionem <sic>ad productionem</sic>
    alicuius effectus re
    <lb ed="#L" n="2"/>quisitis
    <lb ed="#L" n="3"/>omni alio secluso talis effectus posset naturaliter poni in esse <g
      ref="#slash"/>2a pars probatur quia
    <lb ed="#L" n="4"/>quia si sola obiecti cognitio etc sequitur quod stante iudicio vel
    apprehensione alicuius
    <lb ed="#L" n="5"/>obiecti sub ratione <corr>
      <del rend="strikethrough">boni</del>
      <add place="inLine">mali</add>
    </corr> seclusa omnia existentia vel apparentia bonitatis
    <lb ed="#L" n="6"/>voluntas posset tale obiectum velle vel diligere consequentia nota sed
    consequens est contra <ref>
      <name ref="#Aristotle">philosophum</name>
    </ref> et <ref>
      <name ref="#Averroes">commentatorem</name>
      <lb ed="#L" n="7"/>primo <name ref="#Ethics">ethicorum</name>
    </ref> quia omnia bonum appetunt
  <p xml:id="pgb1q2-d1e3692">
    <g ref="#pilcrow"/>primum corollarium 
    <lb ed="#L" n="8"/>

任何建议都值得赞赏。

Answer 1

此算法虽然比Java代码慢3倍，但它在内存中起作用：

(:~  trim the XML from $nodes $start to $end 
 :   The algorithm is 
 : 1) find  all the ancestors of the start node - $startParents
 : 2) find  all the ancestors of the end node- $endParents
 : 3) recursively, starting with the common top we create a new element which is a copy of the element being trimmed by 
 :    3.1 copying all attributes 
 :    3.2 there are four cases depending on the node and the start and end edge nodes of the tree
 :     a) left and right nodes are the same - nothing else to copy
 :     b) both nodes are in the node's children - trim the start one, copy the intervening children and trim the end one
 :     c) only the start node is in the node's children - trim this node and copy the following siblings
 :     d) only the end node is in the node's children  - copy the preceding siblings and trim the node
 :    attributes (currently in the fb namespace since its not a TEI attribute) are added to trimmed nodes  
 : @param start  - the element bounding the start of the subtree
 : @param end - the element bounding the end of the subtree
:)

declare function fb:trim-node($start as node() ,$end as node()) {
let $startParents := $start/ancestor-or-self::*
let $endParents := $end/ancestor-or-self::*
let $top := $startParents[1]
return
   fb:trim-node($top,subsequence($startParents,2),subsequence($endParents,2))
};

declare function fb:trim-node($node as node(), $start as node()*, $end as node()*) {
       if (empty($start) and empty($end)) 
       then $node                                                       (: leaf is untrimmed :)
       else 
          let $startNode := $start[1]
          let $endNode:= $end[1]
          let $children := $node/node()
          return
             element {QName (namespace-uri($node), name($node))} {       (: preserve the namespace :)
              $node/@* ,                                                 (: copy all the attributes :)
              if ($startNode is $endNode)                                (: edge node  is common :)
              then fb:trim-node($startNode, subsequence($start,2),subsequence($end,2))
              else 
              if ($startNode = $children and $endNode = $children)       (: both in same subtree :)
              then (fb:trim-node($startNode, subsequence($start,2),()),  (: first the trimmed start node :)
                                                                         (: then the siblings between start and end nodes :)                                                                     
                    $startNode/following-sibling::node() 
                           except $endNode/following-sibling::node() 
                           except $endNode,

                    fb:trim-node($endNode, (), subsequence($end,2))      (: then the trimmed end node :)      
                   )
              else if ($startNode = $children)                           (: start node is in the children :)
              then 
                 ( fb:trim-node($startNode, subsequence($start,2),()),  (: first the trimmed start node :)
                   $startNode/following-sibling::node()                 (: then  the following siblings :)
                 )
              else if ($endNode = $children)                            (: end node is in the children :)
              then 
                 (  $endNode/preceding-sibling::node(),                  (: the preceding siblings :)
                    fb:trim-node($endNode, (), subsequence($end,2))      (: then the trimmed end node :)              
                 )
              else ()      
            }
};

使用joewiz原创的演示应用程序比较了包括Java在内的四种算法：http://kitwallace.co.uk/Book/set/fragment-between/page

在eXist-Db中使用util：get-fragment-between错误

1 个答案: