SPARQL中的聚合+排名 - >累积百分比?

时间:2013-12-05 04:17:25

标签: sparql aggregation rank

Joshua Taylor在回答以下问题时提出了一种计算SPARQL排名的非常优雅的方法 How to rank values in SPARQL?

但是,我需要使用聚合+排名作为第一步来获取查询中的累积百分比,例如“占销售额前50%的国家/地区。”

假设我们在州一级有销售

California 30
Arizona 25
...
Alberta 25
Quebec 20
...

我需要首先按国家汇总并获得该国家的排名如下

United States 250 1
Canada 200 2
Mexico 150 3
...

如何在SPARQL中编写聚合+排名,概括上述Joshua的回答?

对于下一步,我需要使用等级进行累积和,如下所示

United States 250 1 250
Canada 200 2 450
Mexico 150 3 600
...

为了在SPARQL中写这个,我想我可以使用这里报告的SQL技巧 how to get cumulative sum

如果上述成功,那么我们可以尝试累积百分比。现在假设所有国家的总销售额是1000.那么结果将是

United States 250  250
Canada 200 450

为了在SPARQL中编写它,我们可以使用SQL中的内容如下

SELECT CountryName, SalesAmount
FROM CumSalesCountry
WHERE CumSalesAmount <=
(SELECT MIN(CumSalesAmount) FROM CumSalesCountry
WHERE CumSalesAmount >=
(SELECT 0.5 * SUM(SalesAmount) FROM SalesCountry) ) )

对此的任何帮助都将非常感激

Esteban(努力在SPARQL中编写类似BI的查询......)

2 个答案:

答案 0 :(得分:2)

假设您有这样的数据:

@prefix : <urn:ex:> .

:a1 a :A ; :v 05 .
:a2 a :A ; :v 10 .

:b1 a :B ; :v 10 .
:b2 a :B ; :v 10 .
:b3 a :B ; :v 05 .

:c1 a :C ; :v 10 .

然后你可以使用这样的查询:

prefix : <urn:ex:>

select
   ?type
   (?value*100/?total as ?percent)
   (count(?type2) as ?rank)
   (sum(?value2)*100/?total as ?cumulativePercent)
where { 
  #-- total value across all types 
  { select (sum(?value) as ?total) 
    where { ?x :v ?value } }

  #-- each type and its sum value as ?type and ?value
  { select ?type (sum(?v) as ?value)
    where { ?x a ?type ; :v ?v }
    group by ?type }

  #-- each type and its sum value as ?type2 and ?value2
  { select ?type2 (sum(?v) as ?value2)
    where { ?x a ?type2 ; :v ?v }
    group by ?type2 }

  filter ( ?value2 >= ?value )
}
group by ?type ?value ?total
order by desc(?percent)

获得这样的结果:

---------------------------------------------
| type | percent | rank | cumulativePercent |
=============================================
| :B   | 50.0    | 1    | 50.0              |
| :A   | 30.0    | 2    | 80.0              |
| :C   | 20.0    | 3    | 100.0             |
---------------------------------------------

现在,这里有一个问题,如果两种类型有相同的问题 百分比。例如,如果您添加数据

:d1 a :D ; :v 07 .
:d2 a :D ; :v 08 .

然后你得到了结果:

---------------------------------------------------------------------------
| type | percent                     | rank | cumulativePercent           |
===========================================================================
| :B   | 38.461538461538461538461538 | 1    | 38.461538461538461538461538 |
| :A   | 23.076923076923076923076923 | 3    | 84.615384615384615384615384 |
| :D   | 23.076923076923076923076923 | 3    | 84.615384615384615384615384 |
| :C   | 15.384615384615384615384615 | 4    | 99.999999999999999999999999 |
---------------------------------------------------------------------------

这表明A和D是捆绑的,并给予它们相同的累积 百分比和等级。如果这不是你想要的,你可以添加 过滤条件的某些东西,例如,

filter (  ?value2 > ?value
       || ( ?value2 = ?value &&
            str(?type2) >= str(?type) ))

解决相同的案例并得到如下结果:

---------------------------------------------------------------------------
| type | percent                     | rank | cumulativePercent           |
===========================================================================
| :B   | 38.461538461538461538461538 | 1    | 38.461538461538461538461538 |
| :D   | 23.076923076923076923076923 | 2    | 61.538461538461538461538461 |
| :A   | 23.076923076923076923076923 | 3    | 84.615384615384615384615384 |
| :C   | 15.384615384615384615384615 | 4    | 99.999999999999999999999999 |
---------------------------------------------------------------------------

答案 1 :(得分:1)

SPARQL支持子查询,可用于计算聚合作为整体查询的一部分:

SELECT * {
  { SELECT (MIN(?CumSalesAmount) AS ?minSalesAmount) 
    {  ... get ?CumSalesAmount ... } }
  { SELECT (0.5*sum(?SalesAmount) AS ?aggSalesAmount) 
     {  ... get ?SalesAmount ... } }
  ... get ?CountryName ?SalesAmount ...
}

效果并不相同 - 这是解决问题的一种方式,而不是您示例的翻译克隆。