sparql:为每个节点随机选择一个连接

时间:2015-03-17 15:23:27

标签: random rdf sparql semantic-web

我有以下数据:

<node:1><urn:connectTo><node:2>
<node:1><urn:connectTo><node:3>
<node:1><urn:connectTo><node:4>
<node:2><urn:connectTo><node:10>
<node:2><urn:connectTo><node:11>
<node:2><urn:connectTo><node:12>
<node:3><urn:connectTo><node:21>
<node:3><urn:connectTo><node:13>
<node:3><urn:connectTo><node:41>
<node:3><urn:connectTo><node:100>
<node:4><urn:connectTo><node:119>
<node:4><urn:connectTo><node:120>

如您所见,每个节点都有多个连接。我想为每个节点随机选择一个连接。我怎样才能做到这一点?我尝试了以下查询,但没有解决问题:

  1. select ?currentNode ?nextNode where {
      ?currentNode ?p ?nextNode
      BIND(RAND() AS ?orderKey)
    }
    ORDER BY ?orderKey
    LIMIT 1
    
  2. select ?currentNode SAMPLE(?nextNode) as ?nextNode1
    where {
      ?currentNode ?p ?nextNode
    }
    GROUP BY ?currentNode
    

    注意:结果给出了每个节点的第一个连接,但不是随机的

  3. select ?currentNode ?nextNode (COUNT(?nextNode) AS ?noOfChoices)
    where {
      ?currentNode ?p ?nextNode
      BIND(RAND() AS ?orderKey)
    }
    GROUP BY ?currentNode
    ORDER BY ?orderKey
    OFFSET (RAND()*?noOfChoices)
    LIMIT 1
    

1 个答案:

答案 0 :(得分:2)

sample aggregate会从群组中返回个人:

  

Sample是一个set函数,它从中返回一个任意值   multiset传递给它。 ...例如,给定样本({&#34; a&#34;,&#34; b&#34;,   &#34; c&#34;}),&#34; a&#34;,&#34; b&#34;,&#34; c&#34;都是有效的返回值。注意   对于给定的输入,Sample()不需要是确定性的   唯一的限制是输出中必须存在输出值   多重集。

这将是一个类似的查询:

prefix node: <node:>
prefix urn: <urn:>

select ?source (sample(?_target) as ?target) where {
  ?source urn:connectTo ?_target
}
group by ?source

---------------------
| source | target   |
=====================
| node:1 | node:2   |
| node:2 | node:10  |
| node:3 | node:13  |
| node:4 | node:119 |
---------------------

当然,正如您所指出的,实现只需要返回任意个体。这很容易就是每次相同的。您可以在子查询中进行一些排序,并希望随机化目标的顺序,以便从示例获得不同的结果,但是并不要求子查询的结果顺序也被保留。这看起来像这样:

prefix node: <node:>
prefix urn: <urn:>

select ?source (sample(?_target) as ?target) where {
  { select ?source ?_target {
      ?source urn:connectTo ?_target
    }
    order by rand() }
}
group by ?source

这似乎适用于Apache Jena。以下是重复调用的结果:

---------------------
| source | target   |
=====================
| node:1 | node:2   |
| node:2 | node:11  |
| node:3 | node:100 |
| node:4 | node:120 |
---------------------

---------------------
| source | target   |
=====================
| node:1 | node:3   |
| node:2 | node:11  |
| node:3 | node:13  |
| node:4 | node:120 |
---------------------

---------------------
| source | target   |
=====================
| node:1 | node:3   |
| node:2 | node:10  |
| node:3 | node:21  |
| node:4 | node:119 |
---------------------

---------------------
| source | target   |
=====================
| node:1 | node:3   |
| node:2 | node:10  |
| node:3 | node:100 |
| node:4 | node:119 |
---------------------