我正在对Wiki数据运行聚合查询。该查询尝试根据电影的流派和出版年份来计算电影的平均播放时间
查询中的多个分组/子查询旨在保留电影与分组标准(年份和类型)之间的n-1
关系,以及电影与其时长之间的1-1
关系。原因是聚合正确(OLAP和数据仓库从业人员熟悉n-1关系)。
更多解释已嵌入查询中。因此,我无法删除在子查询和if语句或组串联中完成的分组。该查询在Wikidata SPARQL endpoint上超时。
问题
我需要一些增强性能的建议... 任何优化提示 ?万一这是不可能的,任何人都知道某种 经过身份验证的方式 (这样他们就知道我不在玩)来查询Wikidata,从而可以增加超时时间,或者采取一种方法要 一般增加超时时间 ?
# Average duration of films, grouped by their genre and the year of publication
SELECT
?genre1 # film genre
?year1 # film year of publication
(AVG(?duration1) AS ?avg) # film average duration
WHERE
{
# Calculating the average duration for each single film.
# As there are films with multiple duration, these durations are
# averagred by grouping aggregating durations by film.
# Hence, a single duration for each film is projected out from the subquery.
{
select ?film (avg(?duration) as ?duration1)
where{
?film <http://www.wikidata.org/prop/direct/P2047> ?duration .
}group by ?film
}
# Here the grouping criteria (genre and year) are calculated.
# The criteria is grouped by film, so that in case multiple
# genre/multiple year exist for a single film, all of them are
# group concated into a single value.
# Also in case of a lack of a value of year or genre for some
# specific film, a dummy value "OtherYear"/"OtherGenre" is generated.
{
select ?film (
IF
(
group_concat(distinct ?year ; separator="-- ") != "",
# In case multiple year exist for a single film, all of them are group concated into a single value.
group_concat(distinct ?year ; separator="-- "),
# In case of a lack of a value of year for some specific film, a dummy value "OtherYear" is generated.
"OtherYear"
)
as ?year1
)
(
IF
(
group_concat(distinct ?genre ; separator="-- ") != "",
# In case multiple genre exist for a single film, all of them are group concated into a single value.
group_concat(distinct ?genre ; separator="-- "),
# In case of a lack of a value of genre for some specific film, a dummy value "OtherGenre" is generated.
"OtherGenre"
)
as ?genre1
)
where
{
?film <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q11424> .
optional {
?film <http://www.wikidata.org/prop/direct/P577> ?date .
BIND(year(?date) AS ?year)
}
optional {
?film <http://www.wikidata.org/prop/direct/P136> ?genre .
}
} group by ?film
}
} GROUP BY ?year1 ?genre1
答案 0 :(得分:1)
用一个简单的IF
(从组中选择一个任意值)替换两个sample
表达式后,该查询似乎可以工作:
(sample(?year) as ?year1)
(sample(?genre) as ?genre1)
因此,看来group_concat
的花费是主要问题。我觉得不是很直观,也没有解释。
也许带有sample
的版本足够好,或者至少它可以为您提供进一步改进的基准点。