我有一个像这样的RDF数据集:
<subject1> <some_predicate> "Value 1" .
<subject1> <some_predicate> "Value 2" .
<subject1> <some_predicate> "Value 3" .
<subject1> <some_predicate> "Value 4" .
<subject1> <some_predicate> "Value 5" .
<subject2> <some_predicate> "Value 6" .
<subject2> <some_predicate> "Value 7" .
<subject2> <some_predicate> "Value 8" .
<subject2> <some_predicate> "Value 9" .
<subject2> <some_predicate> "Value 10" .
现在,对于每个主题,我想要有两个随机值&#34; some_predicate&#34;。它们应该是两个不同的。所以,预期的结果将是:
--------------------------------------------
| subject | random_value_1 | random_value_2 |
============================================
| subject1 | "Value 2" | "Value 5" |
| subject2 | "Value 6" | "Value 7" |
--------------------------------------------
我发现了这个问题sparql: randomly select one connection for each node但是,问题只是获得一个值,我需要两个不同的值。
答案 0 :(得分:1)
你可以只是关于同样的事情,但它有点复杂。首先为其他主题选择一个随机值。然后,在外部查询中,以相同的方式选择一个随机值,但是与第一个不同的一个(通过删除过滤器可以允许相同):
select ?subject (sample(?v1) as ?value1) (sample(?v2) as ?value2) {
{ select ?subject ?v1 ?v2 {
{ select ?subject ?v1 {
?subject <some_predicate> ?v1
}
order by rand() }
?subject <some_predicate> ?v2
filter(!sameTerm(?v1,?v2))
}
order by rand()
}
}
group by ?subject
请注意,应用于关联问题sparql: randomly select one connection for each node的相同警告适用;由于未指定样本的实施,因此可以想象得到非随机结果。以下是使用Jena的ARQ的一些示例输出:
---------------------------------------
| subject | value1 | value2 |
=======================================
| <subject1> | "Value 1" | "Value 2" |
| <subject2> | "Value 10" | "Value 6" |
---------------------------------------
--------------------------------------
| subject | value1 | value2 |
======================================
| <subject1> | "Value 4" | "Value 1" |
| <subject2> | "Value 8" | "Value 6" |
--------------------------------------
--------------------------------------
| subject | value1 | value2 |
======================================
| <subject1> | "Value 4" | "Value 3" |
| <subject2> | "Value 6" | "Value 8" |
--------------------------------------