XQuery如何制作相似度矩阵?

时间:2016-06-30 22:34:14

标签: xquery marklogic

我们假设我们有 n 记录。我想计算每条记录和所有其他记录之间的相似性。我想制作一个相似矩阵。我是XQuery的新手,但我正在努力。我附上了一对屏幕截图,显示了一对记录之间的相似性。 similarity between 2 records

这是一个csv字符串。我使用以下for循环来生成此示例:

for $item1 at $index in /rec:Record 
let $records:= /rec:Record 
for $item2 in $records[$index + 1]

(: here I call the similarity functions :)

return 
(: csv output :)

我需要编辑for循环以在数据集中的每对记录之间生成相似性矩阵。怎么做??

注意:相似度函数准备就绪,我的问题是计算相似度本身。

2 个答案:

答案 0 :(得分:2)

编辑:将CSV输出添加为文本节点结束:

考虑MarkLogic中地图的强大功能。

以ML表示矩阵的样本如下。我也搞了两件事:一个函数作为你的公式的占位符(包括传递原始序列,以防你需要全部用于分析)以及一个小函数来显示如何访问地图的地图。

xquery version "1.0-ml";

declare function local:csv($matrix){
  let $nl := "
"
  return text{ 
    for $x in map:keys($matrix)
      let $row := map:get($matrix, $x)
      order by xs:int($x)
      return fn:string-join(for $y in map:keys($row)
        order by xs:int($y)
        return xs:string(map:get($row, $y))
      , ",") || $nl 
  }
};

declare function local:my-formula($x, $y, $seq){
let $foo := "do something"
return "your-formula for " || xs:string($x) || " and " || xs:string($y)
};

declare function local:pretty($matrix){
  <matrix>
  {
    for $x in map:keys($matrix)
      order by xs:int($x)
    return <row>
    {
    let $row := map:get($matrix, $x)
     for $y in map:keys($row)
        order by xs:int($y)
            return <cell x="{$x}" y="{$y}">{map:get($row, $y)}</cell>

    }
    </row>


  }
 </matrix> 
};

let $matrix := map:map()
let $numbers := "1,2,3,4,5,5,6,7,8"
let $seq := fn:tokenize($numbers, ",")

let $_ := for $x in $seq
    let $map := map:map()
    let $_ := for $y in $seq
       return  map:put($map, $y, local:my-formula($x, $y, $seq))
    return map:put($matrix, $x, $map)

return local:pretty($matrix)

你可以直接转出地图地图($ matrix)。但是,local:pretty函数返回一种格式,以便您轻松查看地图的构​​造:

<matrix>
  <row>
    <cell x="1" y="1">your-formula for 1 and 1</cell>
    <cell x="1" y="2">your-formula for 1 and 2</cell>
    <cell x="1" y="3">your-formula for 1 and 3</cell>
    <cell x="1" y="4">your-formula for 1 and 4</cell>
    <cell x="1" y="5">your-formula for 1 and 5</cell>
    <cell x="1" y="6">your-formula for 1 and 6</cell>
    <cell x="1" y="7">your-formula for 1 and 7</cell>
    <cell x="1" y="8">your-formula for 1 and 8</cell>
  </row>
  <row>
    <cell x="2" y="1">your-formula for 2 and 1</cell>
    <cell x="2" y="2">your-formula for 2 and 2</cell>
    <cell x="2" y="3">your-formula for 2 and 3</cell>
    <cell x="2" y="4">your-formula for 2 and 4</cell>
    <cell x="2" y="5">your-formula for 2 and 5</cell>
    <cell x="2" y="6">your-formula for 2 and 6</cell>
    <cell x="2" y="7">your-formula for 2 and 7</cell>
    <cell x="2" y="8">your-formula for 2 and 8</cell>
  </row>
  <row>
    <cell x="3" y="1">your-formula for 3 and 1</cell>
    <cell x="3" y="2">your-formula for 3 and 2</cell>
    <cell x="3" y="3">your-formula for 3 and 3</cell>
    <cell x="3" y="4">your-formula for 3 and 4</cell>
    <cell x="3" y="5">your-formula for 3 and 5</cell>
    <cell x="3" y="6">your-formula for 3 and 6</cell>
    <cell x="3" y="7">your-formula for 3 and 7</cell>
    <cell x="3" y="8">your-formula for 3 and 8</cell>
  </row>
  <row>
    <cell x="4" y="1">your-formula for 4 and 1</cell>
    <cell x="4" y="2">your-formula for 4 and 2</cell>
    <cell x="4" y="3">your-formula for 4 and 3</cell>
    <cell x="4" y="4">your-formula for 4 and 4</cell>
    <cell x="4" y="5">your-formula for 4 and 5</cell>
    <cell x="4" y="6">your-formula for 4 and 6</cell>
    <cell x="4" y="7">your-formula for 4 and 7</cell>
    <cell x="4" y="8">your-formula for 4 and 8</cell>
  </row>
  <row>
    <cell x="5" y="1">your-formula for 5 and 1</cell>
    <cell x="5" y="2">your-formula for 5 and 2</cell>
    <cell x="5" y="3">your-formula for 5 and 3</cell>
    <cell x="5" y="4">your-formula for 5 and 4</cell>
    <cell x="5" y="5">your-formula for 5 and 5</cell>
    <cell x="5" y="6">your-formula for 5 and 6</cell>
    <cell x="5" y="7">your-formula for 5 and 7</cell>
    <cell x="5" y="8">your-formula for 5 and 8</cell>
  </row>
  <row>
    <cell x="6" y="1">your-formula for 6 and 1</cell>
    <cell x="6" y="2">your-formula for 6 and 2</cell>
    <cell x="6" y="3">your-formula for 6 and 3</cell>
    <cell x="6" y="4">your-formula for 6 and 4</cell>
    <cell x="6" y="5">your-formula for 6 and 5</cell>
    <cell x="6" y="6">your-formula for 6 and 6</cell>
    <cell x="6" y="7">your-formula for 6 and 7</cell>
    <cell x="6" y="8">your-formula for 6 and 8</cell>
  </row>
  <row>
    <cell x="7" y="1">your-formula for 7 and 1</cell>
    <cell x="7" y="2">your-formula for 7 and 2</cell>
    <cell x="7" y="3">your-formula for 7 and 3</cell>
    <cell x="7" y="4">your-formula for 7 and 4</cell>
    <cell x="7" y="5">your-formula for 7 and 5</cell>
    <cell x="7" y="6">your-formula for 7 and 6</cell>
    <cell x="7" y="7">your-formula for 7 and 7</cell>
    <cell x="7" y="8">your-formula for 7 and 8</cell>
  </row>
  <row>
    <cell x="8" y="1">your-formula for 8 and 1</cell>
    <cell x="8" y="2">your-formula for 8 and 2</cell>
    <cell x="8" y="3">your-formula for 8 and 3</cell>
    <cell x="8" y="4">your-formula for 8 and 4</cell>
    <cell x="8" y="5">your-formula for 8 and 5</cell>
    <cell x="8" y="6">your-formula for 8 and 6</cell>
    <cell x="8" y="7">your-formula for 8 and 7</cell>
    <cell x="8" y="8">your-formula for 8 and 8</cell>
  </row>
</matrix>

对于CSV,有一个名为local:csv的示例函数,它创建一个文本节点,结果如下:

 your-formula for 1 and 1,your-formula for 1 and 2,your-formula for 1 and 3,your-formula for 1 and 4,your-formula for 1 and 5,your-formula for 1 and 6,your-formula for 1 and 7,your-formula for 1 and 8
 your-formula for 2 and 1,your-formula for 2 and 2,your-formula for 2 and 3,your-formula for 2 and 4,your-formula for 2 and 5,your-formula for 2 and 6,your-formula for 2 and 7,your-formula for 2 and 8
 your-formula for 3 and 1,your-formula for 3 and 2,your-formula for 3 and 3,your-formula for 3 and 4,your-formula for 3 and 5,your-formula for 3 and 6,your-formula for 3 and 7,your-formula for 3 and 8
 your-formula for 4 and 1,your-formula for 4 and 2,your-formula for 4 and 3,your-formula for 4 and 4,your-formula for 4 and 5,your-formula for 4 and 6,your-formula for 4 and 7,your-formula for 4 and 8
 your-formula for 5 and 1,your-formula for 5 and 2,your-formula for 5 and 3,your-formula for 5 and 4,your-formula for 5 and 5,your-formula for 5 and 6,your-formula for 5 and 7,your-formula for 5 and 8
 your-formula for 6 and 1,your-formula for 6 and 2,your-formula for 6 and 3,your-formula for 6 and 4,your-formula for 6 and 5,your-formula for 6 and 6,your-formula for 6 and 7,your-formula for 6 and 8
 your-formula for 7 and 1,your-formula for 7 and 2,your-formula for 7 and 3,your-formula for 7 and 4,your-formula for 7 and 5,your-formula for 7 and 6,your-formula for 7 and 7,your-formula for 7 and 8
 your-formula for 8 and 1,your-formula for 8 and 2,your-formula for 8 and 3,your-formula for 8 and 4,your-formula for 8 and 5,your-formula for 8 and 6,your-formula for 8 and 7,your-formula for 8 and 8

答案 1 :(得分:1)

你可能会做这样的事情。我不确定你的csv是什么样的,或者你的解析器如何加载它。我还嘲笑了你表示你已经完成的某种功能。

declare function local:somefn ($listA as xs:integer*, $listB as xs:integer*) xs:string { "6,7,10,3" };

let $data :=
    <csv>
        <row>1,1,1</row>
        <row>2,2,2</row>
        <row>3,3,3</row>
        <row>4,4,4</row>
    </csv>

for $row1 at $pos in $data/row
for $row2 in $data/row[ position() > $pos ]
    let $x := local:somefn($row1, $row2)
    return $x

在baseX中产生:

6,7,10,3
6,7,10,3
6,7,10,3
6,7,10,3
6,7,10,3
6,7,10,3