Question

我对SPARQL相当新，并且遇到了一些我不理解的有趣行为。

所以我有四个基因：

a-gene
b-gene
c-gene
d-gene

和两个菌株：

strain1
strain2

以及以下三元组：

strain1 hasGene a-gene
strain1 hasGene b-gene
strain1 hasGene d-gene

strain2 hasGene b-gene
strain2 hasGene d-gene

我的目标是制作一个SPARQL查询，为所有菌株的属性hasBinary添加一个值，其中hasBinary是菌株所具有和未具有的基因的相应二元。例如：

strain1 hasBinary 1101
strain2 hasBinary 0101

strain1基因a-gene，b基因，d-gene，但不是c-gene。给出查询（注意菌株和基因分别属于Strain和Gene类）：

select ?s (group_concat(?result ; separator="") as ?binary)
where { ?g a Gene.
        ?s a Strain.
        optional{?s ?hasGene ?g.}.
        bind((if(bound(?hasGene), "1","0")) as ?result). }
group by ?s
order by ?s

输出结果为：

strain1 1101
strain2 0101

哪个是对的。但是当我进行查询时：

construct {?s hasBinary ?binary}
where{

select ?s (group_concat(?result ; separator="") as ?binary)
where { ?g a Gene.
        ?s a Strain.
        optional{?s ?hasGene ?g.}.
        bind((if(bound(?hasGene), "1","0")) as ?result).


      }
group by ?s
order by ?s

}

输出结果为：

strain1 hasBinary 0111
strain2 hasBinary 0011

哪个是完全错误的。好像group_concat正在排序结果。我不知道它为什么会这样做，如果它被排序，二进制文件就没用了。任何有关此问题的帮助将不胜感激。

Answer 1

据我所知，您的查询是正确的，您观察到的问题是您正在使用的任何SPARQL引擎中的错误。或者至少：当我在Sesame商店（版本2.8.8）上尝试你的案例时，它给了我预期的结果。

编辑我得到正确结果的原因是Sesame恰好以预期的顺序返回结果，但正如@TallTed正确评论的那样，查询实际上并没有强制执行，所以它不是你可以依靠。所以我之前断言这是端点中的错误是错误的。

让我们稍微探讨一下。

我使用的数据：

@prefix : <http://example.org/> .

:a-gene a :Gene .
:b-gene a :Gene .
:c-gene a :Gene .
:d-gene a :Gene .

:strain1 a :Strain .
:strain2 a :Strain .

:strain1 :hasGene :a-gene .
:strain1 :hasGene :b-gene .
:strain1 :hasGene :d-gene .


:strain2 :hasGene :b-gene .
:strain2 :hasGene :d-gene .

如果我们查看最简单的查询形式，我们需要支持所有?s和所有?g，其中可选择存在:hasGene关系，我们希望它们按顺序排列。您的初始查询基本上是这样的：

PREFIX : <http://example.org/>
select ?s ?g
where { ?g a :Gene.
        ?s a :Strain.
        optional { ?s ?hasGene ?g } .
}
order by ?s

现在，这个查询在我的Sesame商店（以及你的终端）中返回：

?s                              ?g
<http://example.org/strain1>    <http://example.org/a-gene>
<http://example.org/strain1>    <http://example.org/b-gene>
<http://example.org/strain1>    <http://example.org/c-gene>
<http://example.org/strain1>    <http://example.org/d-gene>
<http://example.org/strain2>    <http://example.org/a-gene>
<http://example.org/strain2>    <http://example.org/b-gene>
<http://example.org/strain2>    <http://example.org/c-gene>
<http://example.org/strain2>    <http://example.org/d-gene>

看起来不错？全部以字母数字顺序排列。但重要的是要认识到?g列的排序是巧合。如果引擎已经返回了这个：

?s                              ?g
<http://example.org/strain1>    <http://example.org/c-gene>
<http://example.org/strain1>    <http://example.org/b-gene>
<http://example.org/strain1>    <http://example.org/a-gene>
<http://example.org/strain1>    <http://example.org/d-gene>
<http://example.org/strain2>    <http://example.org/b-gene>
<http://example.org/strain2>    <http://example.org/a-gene>
<http://example.org/strain2>    <http://example.org/c-gene>
<http://example.org/strain2>    <http://example.org/d-gene>

...它也是一个有效的结果 - 毕竟，我们的查询没有说明应该订购?g。

但解决方案很简单：?s和?g的订单。因为在我们特定的SPARQL端点中，正确的顺序已经“巧合地”返回，即使没有这个，我们也可以通过一个小技巧验证它是否正常工作：使用DESC运算符恢复顺序。

查询：

PREFIX : <http://example.org/>
SELECT ?s ?g
WHERE { ?g a :Gene.
        ?s a :Strain.
        OPTIONAL { ?s ?hasGene ?g } .
}
ORDER BY ?s DESC(?g)

结果：

?s                              ?g
<http://example.org/strain1>    <http://example.org/d-gene>
<http://example.org/strain1>    <http://example.org/c-gene>
<http://example.org/strain1>    <http://example.org/b-gene>
<http://example.org/strain1>    <http://example.org/a-gene>
<http://example.org/strain2>    <http://example.org/d-gene>
<http://example.org/strain2>    <http://example.org/c-gene>
<http://example.org/strain2>    <http://example.org/b-gene>
<http://example.org/strain2>    <http://example.org/a-gene>

您可以看到?g列现在实际上按反向字母顺序排序（这当然与您想要的相反，但只需省略查询的DESC部分就可以轻松纠正稍后 - 重点是这样我们已经验证了它是我们的查询进行排序，而不是我们使用的任何端点。

但它仍然无法完全解决二进制字符串中的排序问题。由于在原始查询中BIND在排序之前发生（因为绑定是图模式的一部分，在结果排序发生之前得到完全评估），ORDER BY子句没有影响力。也就是说，如果我们只是执行此查询：

PREFIX : <http://example.org/> SELECT ?s (GROUP_CONCAT(?result ; SEPARATOR="") as ?binary) WHERE { ?g a :Gene. ?s a :Strain. OPTIONAL { ?s ?hasGene ?g } . BIND((IF(BOUND(?hasGene), "1","0")) AS ?result). } GROUP BY ?s ORDER BY ?s DESC(?g)

我们仍然得到这个结果：

?s ?binary <http://example.org/strain1> "1101" <http://example.org/strain2> "0101"

换句话说，我们的二进制字符串仍然没有倒置，应该是。

解决方案是引入另一个子查询，该子查询为其外部查询提供所需的结果，然后将此有序结果连接起来以创建二进制字符串，如下所示：

PREFIX : <http://example.org/> SELECT ?s (GROUP_CONCAT(?result ; SEPARATOR="") as ?binary) WHERE { { SELECT ?s ?hasGene WHERE { ?g a :Gene. ?s a :Strain. OPTIONAL {?s ?hasGene ?g.}. } ORDER BY ?s DESC(?g) } BIND((IF(BOUND(?hasGene), "1","0")) AS ?result). } GROUP BY ?s

结果是：

?s ?binary <http://example.org/strain1> "1011" <http://example.org/strain2> "1010"

如您所见，查询现在强制执行正确的（反向）二进制字符串。然后，我们需要将整个野兽提供给您想要的CONSTRUCT查询，最后我们需要取出二进制字符串的反转。

然后完整查询变为：

查询2：

PREFIX : <http://example.org/> CONSTRUCT {?s :hasBinary ?binary } WHERE { SELECT ?s (GROUP_CONCAT(?result ; SEPARATOR="") as ?binary) WHERE { { SELECT ?s ?hasGene WHERE { ?g a :Gene. ?s a :Strain. OPTIONAL {?s ?hasGene ?g.}. } ORDER BY ?s ?g } BIND((IF(BOUND(?hasGene), "1","0")) AS ?result). } GROUP BY ?s }

结果：

<http://example.org/strain1> <http://example.org/hasBinary> "1101" . <http://example.org/strain2> <http://example.org/hasBinary> "0101" .

Sparql group_concat - 停止订购

1 个答案: