Joshua Taylor在回答以下问题时提出了一种计算SPARQL排名的非常优雅的方法 How to rank values in SPARQL?
但是,我需要使用聚合+排名作为第一步来获取查询中的累积百分比,例如“占销售额前50%的国家/地区。”
假设我们在州一级有销售
California 30
Arizona 25
...
Alberta 25
Quebec 20
...
我需要首先按国家汇总并获得该国家的排名如下
United States 250 1
Canada 200 2
Mexico 150 3
...
如何在SPARQL中编写聚合+排名,概括上述Joshua的回答?
对于下一步,我需要使用等级进行累积和,如下所示
United States 250 1 250
Canada 200 2 450
Mexico 150 3 600
...
为了在SPARQL中写这个,我想我可以使用这里报告的SQL技巧 how to get cumulative sum
如果上述成功,那么我们可以尝试累积百分比。现在假设所有国家的总销售额是1000.那么结果将是
United States 250 250
Canada 200 450
为了在SPARQL中编写它,我们可以使用SQL中的内容如下
SELECT CountryName, SalesAmount
FROM CumSalesCountry
WHERE CumSalesAmount <=
(SELECT MIN(CumSalesAmount) FROM CumSalesCountry
WHERE CumSalesAmount >=
(SELECT 0.5 * SUM(SalesAmount) FROM SalesCountry) ) )
对此的任何帮助都将非常感激
Esteban(努力在SPARQL中编写类似BI的查询......)
答案 0 :(得分:2)
假设您有这样的数据:
@prefix : <urn:ex:> .
:a1 a :A ; :v 05 .
:a2 a :A ; :v 10 .
:b1 a :B ; :v 10 .
:b2 a :B ; :v 10 .
:b3 a :B ; :v 05 .
:c1 a :C ; :v 10 .
然后你可以使用这样的查询:
prefix : <urn:ex:>
select
?type
(?value*100/?total as ?percent)
(count(?type2) as ?rank)
(sum(?value2)*100/?total as ?cumulativePercent)
where {
#-- total value across all types
{ select (sum(?value) as ?total)
where { ?x :v ?value } }
#-- each type and its sum value as ?type and ?value
{ select ?type (sum(?v) as ?value)
where { ?x a ?type ; :v ?v }
group by ?type }
#-- each type and its sum value as ?type2 and ?value2
{ select ?type2 (sum(?v) as ?value2)
where { ?x a ?type2 ; :v ?v }
group by ?type2 }
filter ( ?value2 >= ?value )
}
group by ?type ?value ?total
order by desc(?percent)
获得这样的结果:
---------------------------------------------
| type | percent | rank | cumulativePercent |
=============================================
| :B | 50.0 | 1 | 50.0 |
| :A | 30.0 | 2 | 80.0 |
| :C | 20.0 | 3 | 100.0 |
---------------------------------------------
现在,这里有一个问题,如果两种类型有相同的问题 百分比。例如,如果您添加数据
:d1 a :D ; :v 07 .
:d2 a :D ; :v 08 .
然后你得到了结果:
---------------------------------------------------------------------------
| type | percent | rank | cumulativePercent |
===========================================================================
| :B | 38.461538461538461538461538 | 1 | 38.461538461538461538461538 |
| :A | 23.076923076923076923076923 | 3 | 84.615384615384615384615384 |
| :D | 23.076923076923076923076923 | 3 | 84.615384615384615384615384 |
| :C | 15.384615384615384615384615 | 4 | 99.999999999999999999999999 |
---------------------------------------------------------------------------
这表明A和D是捆绑的,并给予它们相同的累积 百分比和等级。如果这不是你想要的,你可以添加 过滤条件的某些东西,例如,
filter ( ?value2 > ?value
|| ( ?value2 = ?value &&
str(?type2) >= str(?type) ))
解决相同的案例并得到如下结果:
---------------------------------------------------------------------------
| type | percent | rank | cumulativePercent |
===========================================================================
| :B | 38.461538461538461538461538 | 1 | 38.461538461538461538461538 |
| :D | 23.076923076923076923076923 | 2 | 61.538461538461538461538461 |
| :A | 23.076923076923076923076923 | 3 | 84.615384615384615384615384 |
| :C | 15.384615384615384615384615 | 4 | 99.999999999999999999999999 |
---------------------------------------------------------------------------
答案 1 :(得分:1)
SPARQL支持子查询,可用于计算聚合作为整体查询的一部分:
SELECT * {
{ SELECT (MIN(?CumSalesAmount) AS ?minSalesAmount)
{ ... get ?CumSalesAmount ... } }
{ SELECT (0.5*sum(?SalesAmount) AS ?aggSalesAmount)
{ ... get ?SalesAmount ... } }
... get ?CountryName ?SalesAmount ...
}
效果并不相同 - 这是解决问题的一种方式,而不是您示例的翻译克隆。