sum-aggregate function with non-unique values

时间:2016-07-11 20:20:28

标签: marklogic marklogic-8

I have a set of documents with a structure like:

<DOCUMENT>
    <AMOUNTS>
        <ELEMENT>
            <AMOUNT>10.00</AMOUNT>
            <INFO>
                <CODE1>132</CODE1>
                <CODE2>022</CODE2>
            </INFO>
        </ELEMENT>
        <ELEMENT>
            <AMOUNT>10.00</AMOUNT>
            <INFO>
                <CODE1>132</CODE1>
                <CODE2>121</CODE2>
            </INFO>
        </ELEMENT>
        <ELEMENT>
            <AMOUNT>15.00</AMOUNT>
            <INFO>
                <CODE1>156</CODE1>
                <CODE2>121</CODE2>
            </INFO>
        </ELEMENT>      
    </AMOUNTS>
</DOCUMENT>

I’m looking to do various sums of the AMOUNT element so I’ve put a Path Range Index on the DOCUMENT/AMOUNTS/ELEMENT/AMOUNT element hoping to use the sum-aggregate function. However I’m seeing an issue when using the cts:sum-aggregate function when the sum involves documents that contain more than one AMOUNT element with the same value. To illustrate the issue I’m seeing assume the XML above is stored at the '/DOCS/DOC1.XML' uri. I then run the following xQuery to get the sum of all the AMOUNTs in the document. I'm doing the sum in two different ways and getting two different results:

(
  fn:sum(doc('/DOCS/DOC1.XML')/DOCUMENT/AMOUNTS/ELEMENT/AMOUNT),
  cts:sum-aggregate(
      cts:path-reference("DOCUMENT/AMOUNTS/ELEMENT/AMOUNT"), 
      ("any"),
      cts:document-query('/DOCS/DOC1.XML')
  )
) 

The fn:sum function gives 35 and the cts:sum-aggregate gives 25. The sum-aggregate function is only including one of the 10 values in the sum.

I think I’m doing something wrong but I can’t figure out what, can someone shed some light on this for me?

Thanks

David

2 个答案:

答案 0 :(得分:2)

After reading the answer from wst I confirmed that the type of my index was decimal and then played around with the options a bit and found that adding "item-frequency" as an option to the sum-aggregate function solved my issue. I don't completely understand the nuances between "item-frequency" and "fragment-frequency" in relation to the sum-aggregate function but the following xQuery works like I expect it to causing both sums to return the same value.

(
  fn:sum(doc('/DOCS/DOC1.XML')/DOCUMENT/AMOUNTS/ELEMENT/AMOUNT),
  cts:sum-aggregate(
      cts:path-reference("DOCUMENT/AMOUNTS/ELEMENT/AMOUNT"), 
      ("item-frequency"),
      cts:document-query('/DOCS/DOC1.XML')
  )
) 

答案 1 :(得分:1)

Is your path index a string type or a number (float, double, etc.) type? I wouldn't expect this to work at all with strings, but maybe it is, and I don't see you passing a option to set the type to a number (("any", "type=double")).

String indexes combine identical (according to the collation) values into a single entry and increment the entry's cts:frequency. If sum-aggregate does work over string indexes (and I don't see anything in the documentation to suggest otherwise), that could explain why the duplicate value is only counted once.