在连续结果集中:错误:[XPTY0004]预期单个项目,(元素x {...},元素x {...},...)找到

时间:2012-10-01 05:43:49

标签: xml-parsing xquery

我发现以下代码适用于我的一小部分数据,但我没有意识到我没有采用任何带有多个注释的样本。当我尝试将代码应用于实际数据库时,每个条目有多个注释,我收到了上面提到的错误。

当前代码:

for $doc in doc('test')
let $results :=
(
  let $pKeywords := ('best clients', 'Very', '20')
  return
    for $kw in $pKeywords
    return
    (
      $doc/set/entry[contains(comment, concat('!', $kw))],
      $doc/set/entry[contains(comment, $kw)]
    )
  [not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
  subsequence($results/comment, $i, 1),
  subsequence($results/buyer, $i, 1)
)

文件:

<set>
  <entry>
    <comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
    <buyer></buyer>
    <id>1282</id>
    <industry>International Trade; Fish and Game</industry>
  </entry>
  <entry>
    <comment>!On leave in October.</comment>
    <comment>!Planning to make a large purchase before Christmas.</comment>
    <buyer></buyer>
    <id>709</id>
    <industry>Real Estate</industry>
  </entry>
    <entry>
    <comment>Is often !out between 1 and 3 p.m.</comment>
    <buyer></buyer>
    <id>127</id>
    <industry>Virus Software Marketting</industry>
  </entry>
  <entry>
    <comment>Very personable.  One of our best clients.</comment>
    <buyer></buyer>
    <id>14851</id>
    <industry>Administrative support.</industry>
  </entry>
  <entry>
    <comment>!Very difficult to reach, but one of our top buyers.</comment>
    <comment>His wife often answers the phone.  That means he is out of the office.</comment>
    <buyer></buyer>
    <id>1458</id>
    <industry>Construction</industry>
  </entry>
  <entry>
    <comment></comment>
    <buyer></buyer>
    <id>276470</id>
    <industry>Bulk Furniture Sales</industry>
  </entry>
  <entry>
    <comment>A bit of an eccentric.  One of our best clients.</comment>
    <buyer></buyer>
    <id>1506</id>
    <industry>Sports Analysis</industry>
  </entry>
  <entry>
    <comment>Very gullible, so please !be sure she needs what you sell her.  She's one of our best clients.</comment>
    <buyer></buyer>
    <id>1523</id>
    <industry>International Trade</industry>
  </entry>
  <entry>
    <comment>He wants to buy everything, but !he has a tight budget.</comment>
    <comment>!His company may be closing soon.</comment>
    <buyer></buyer>
    <id>1524</id>
    <industry>Public Relations</industry>
  </entry>
</set>

结果:

Stopped at line 9, column 22: [XPTY0004] document-node()(...): function(item()*) as item()* expected, document-node() found.

我遇到了类似的错误并且能够修复它,但是当我尝试应用修复时,这不起作用。例如:

  $doc('test')/set/entry[contains(., concat('!', $kw))],
  $doc('test')/set/entry[contains(., $kw)]

返回相同的结果。

完成所需的结果:

如果条目return子项包含entry中的三个关键字中的任何一个,则第一个comment应返回每个$pKeywords及其子级。

concat('!', $kw)应该使!-containing comments成为优先事项。

第二个return对第一个comment的结果中的buyerreturn个节点进行切片。

只要每个条目中只有1个comment - 命名节点,代码就可以正常运行。当有2个或更多comment个命名节点时,代码失败,编译器返回上述错误:

Stopped at line 9, column 22: [XPTY0004] document-node()(...): function(item()*) as item()* expected, document-node() found.

- 编辑 -

期望的结果:

<comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
<buyer/>
<comment>Very personable.  One of our best clients.</comment>
<buyer/>
<comment>!Very difficult to reach, but one of our top buyers.</comment>
<buyer/>
<comment>A bit of an eccentric.  One of our best clients.</comment>
<buyer/>

澄清所需的结果:

//contains ! and the first keyword, "best clients"; so, the first result should come from this entry.
  <entry>
    <comment>Very gullible, so please !be sure she needs what you sell her.  She's one of our best clients.</comment>
    <buyer></buyer>
    <id>1523</id>
    <industry>International Trade</industry>
  </entry>

//Only one entry contains ! and "best clients".  So, the first result containing "best clients" contains nodes for the second result.
  <entry>
    <comment>Very personable.  One of our best clients.</comment>
    <buyer></buyer>
    <id>14851</id>
    <industry>Administrative support.</industry>
  </entry>

//This contains ! and the second keyword, "Very", but it is a duplicate.  So, ideally its children should not be returned.
  <entry>
    <comment>!Very difficult to reach, but one of our top buyers.</comment>
    <comment>His wife often answers the phone.  That means he is out of the office.</comment>
    <buyer></buyer>
    <id>1458</id>
    <industry>Construction</industry>
  </entry>

//This contains ! and a string, "very" (part of everything).  Nodes from this entry should be returned as the third result.
  <entry>
    <comment>He wants to buy everything, but !he has a tight budget.</comment>
    <comment>!His company may be closing soon.</comment>
    <buyer></buyer>
    <id>1524</id>
    <industry>Public Relations</industry>
  </entry>

//The only entry whose comment child contains the keyword '20'.  There is no '!'-containing comment with 20, so this nodes is the top and only node whose children should be returned.
  <entry>
    <comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
    <buyer></buyer>
    <id>1282</id>
    <industry>International Trade; Fish and Game</industry>
  </entry>

-Edit 2 -

下一步可以更好地了解我想要完成的任务,但是有一些明显的语法错误(例如,我仍然在发现如何使用数组,如第8行所示)。我将在解决语法错误时对此进行更新:

<set>
{
    let $kw := ('best clients', 'Very', '20')
    let $entry := doc('test')/set/entry
    let $priority := '!'

    for $i in (1, count($kw))
    let $priority_result[$i] :=
    (
        for $entries in $entry
        where $entry contains(., $priority) and where $entry contains $kw[$i]
        return subsequence($priority_result[$i], 1, 2)
    )

    if $priority_result[$i] < 2
    for $i in (1, count($kw))
    let $secondary_result[$i] :=
    (
        for $entries in $entry
        where $entry contains $kw[$i] and where $entry not($priority_result) and where $entry not($secondary_result[1..($i-1)])
        return $secondary_result[$i]
    )
    else let $secondary_result[$i] := ''

    for $i in (1, count($kw))
    return
    (
        $primary_result[$i],
        $secondary_result[$i]
    )
}
</set>

建议的更改,返回null结果:

for $doc in doc('test')
let $results :=
(
  let $pKeywords := ('best clients', 'Very', '20')
  return
    for $kw in $pKeywords
    return
    (
      $doc/set/entry/comment[contains(., concat('!', $kw))],
      $doc/set/entry/comment[contains(., $kw)]
    )
  [not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
  subsequence($results/comment, $i, 1),
  subsequence($results/buyer, $i, 1)
)

2 个答案:

答案 0 :(得分:1)

错误消息似乎在抱怨尝试将document-node()作为函数调用。

$doc('test') vs $doc


或者,comments(...)仅适用于单个节点,而不适用于节点集。

contains(comment, $kw) vs comment/contains(.,$kw)
comment[contains(.,$kw)]
comment[contains(text(),$kw)]


这对我有用:

<set>{
    for $entry in doc('test')/set/entry
    let $kw := (
        for $prefix in ('!','')
        for $kw in ('best clients', 'Very', '20')
        where exists($entry/comment[contains(., concat($prefix,$kw))])
        return concat($prefix,$kw)
    )[1]
    where exists($kw)
    order by not(starts-with($kw,'!'))
    return <entry keyword="{$kw}">{
      ( $entry/comment,
        $entry/buyer )
    }</entry>
}</set>

结果(每<entry>多个评论):

<set>
   <entry keyword="!Very">
      <comment>!Very difficult to reach, but one of our top buyers.</comment>
      <comment>His wife often answers the phone.  That means he is out of the office.</comment>
      <buyer/>
   </entry>
   <entry keyword="20">
      <comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>Very personable.  One of our best clients.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>A bit of an eccentric.  One of our best clients.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>Very gullible, so please !be sure she needs what you sell her.  She's one of our best clients.</comment>
      <buyer/>
   </entry>
</set>

这将为每条评论提供单独的条目:

<set>{
    for $entry in doc('test')/set/entry
    for $comment in $entry/comment
    let $kw := (
        for $prefix in ('!','')
        for $kw in ('best clients', 'Very', '20')
        where exists($comment[contains(., concat($prefix,$kw))])
        return concat($prefix,$kw)
    )[1]
    where exists($kw)
    order by not(starts-with($kw,'!'))
    return <entry keyword="{$kw}">{
      ( $comment,
        $entry/buyer )
    }</entry>
}</set>

输出:

<set>
   <entry keyword="!Very">
      <comment>!Very difficult to reach, but one of our top buyers.</comment>
      <buyer/>
   </entry>
   <entry keyword="20">
      <comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>Very personable.  One of our best clients.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>A bit of an eccentric.  One of our best clients.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>Very gullible, so please !be sure she needs what you sell her.  She's one of our best clients.</comment>
      <buyer/>
   </entry>
</set>

答案 1 :(得分:0)

为了参考起见,这是我们开始的代码(它有点令人生畏,我仍然不明白):

for $doc in doc('test')
let $results :=
(
  let $pKeywords := ('best clients', 'Very', '20')
  return
    for $kw in $pKeywords
    return
    (
      $doc/set/entry[contains(comment, concat('!', $kw))],  (: *1 :)
      $doc/set/entry[contains(comment, $kw)]                (: *1 :)
    )
  [not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
  subsequence($results/comment, $i, 1), (: *2 :)
  subsequence($results/buyer, $i, 1)    (: *2 :)
)

不会抛出错误的版本以典型方式解决。我花了一些时间来捕获第二个错误,标记为*2。基本上,因为我在搜索中更深入一级*1,所以我需要在结果上升一级,..

for $doc in doc('test')
let $results :=
(
  let $pKeywords := ('best clients', 'Very', '20')
  return
    for $kw in $pKeywords
    return
    (
      $doc/set/entry/comment[contains(., concat('!', $kw))], (: *1, went deeper :)
      $doc/set/entry/comment[contains(., $kw)]               (: *1, went deeper :)
    )
  [not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
  subsequence($results/../comment, $i, 1), (: *2, added .. :)
  subsequence($results/../buyer, $i, 1)    (: *2, added .. :)
)

我还在努力:

1)使用concat()。我的理解是它将两个事物放在一起,$kw[1]的结果将等同于"!best clients"。但结果并没有显示出来。在结果中,感叹号并不总是直接位于优先级查询之前。

2)不返回重复结果。我希望每个条目都是独一无二的。我需要在某个地方添加一个例程,要么限制重复项进入我的结果集,要么消除[not(position() gt 2)]之前的重复项,其中结果数被修剪/切片。

感谢所有观众和努力工作!仍然期待更好的答案!