Question

我写了这个查询并返回夫妻和特定条件的列表。（在http://live.dbpedia.org/sparql）

SELECT DISTINCT ?actor ?person2 ?cnt
WHERE
{
{
    select DISTINCT ?actor ?person2 (count (?film) as ?cnt) 
    where { 
        ?film    dbo:starring ?actor .
        ?actor dbo:spouse ?person2. 
        ?film    dbo:starring ?person2.
    }
    order by ?actor
}
FILTER (?cnt >9)
}

问题是某些行是重复的。例如：

http://dbpedia.org/resource/George_Burns http://dbpedia.org/resource/Gracie_Allen 12

http://dbpedia.org/resource/Gracie_Allen http://dbpedia.org/resource/George_Burns 12

如何删除这些重复？我将性别添加到？actor但它会损害当前结果。

Answer 1

Natan Cox's answer显示了排除这些伪重复项的典型方法。结果实际上并不重复，因为在一个例子中，乔治伯恩斯是？演员，而在另一个，他是？person2。在许多情况下，您可以添加一个过滤器，以要求对这两件事进行排序，这样就可以删除重复的案例。例如，当您有以下数据时：

:a :likes :b .
:a :likes :c .

然后搜索

select ?x ?y where { 
  :a :likes ?x, ?y .
}

你可以添加过滤器（？x＆lt;？y）来强制执行？x和？y之间的排序，这将删除这些伪重复项。然而，在这种情况下，它有点棘手，因为？actor和？person2没有找到使用相同的标准。如果DBpedia包含

:PersonB dbo:spouse :PersonA

但不是

:PersonA dbo:spouse :PersonB

然后简单的过滤器不起作用，因为你永远不会找到主题PersonA小于对象PersonB的三元组。因此，在这种情况下，您还需要稍微修改一下查询以使条件对称：

select distinct ?actor ?spouse (count(?film) as ?count) {
  ?film dbo:starring ?actor, ?spouse .
  ?actor dbo:spouse|^dbo:spouse ?spouse .
  filter(?actor < ?spouse)
}
group by ?actor ?spouse
having (count(?film) > 9)
order by ?actor

（此查询还显示您不需要此处的子查询，您可以使用来对＆＃34;过滤＆＃34;对汇总值。）但重要的部分是使用属性路径 dbo：spouse | ^ dbo：spouse 为？spouse找到一个值，使 ？actor dbo：spouse？spouse 或 ？配偶dbo：配偶？演员。这使得关系是对称的，因此您可以保证获得所有对，即使关系仅在一个方向上声明。

Answer 2

当然，这不是实际重复，因为您可以从两个方面来看待它。如果您愿意，修复它的方法是添加过滤器。这是一个肮脏的黑客，但它只占用了＃34;相同＆＃34;。

SELECT DISTINCT ?actor ?person2 ?cnt
WHERE
{
{
    select DISTINCT ?actor ?person2 (count (?film) as ?cnt) 
    where { 
        ?film    dbo:starring ?actor .
        ?actor dbo:spouse ?person2. 
        ?film    dbo:starring ?person2.
FILTER (?actor < ?person2)


    }
    order by ?actor
}
FILTER (?cnt >9)
}

如何删除sparql查询中的重复项

2 个答案: