我有一些非常大的表(在70-90百万行之间)占用40-60Gb的存储空间。这些表也很宽(最多100列)。列之一是创建日期,它也是群集键列。在所有列上也都有主键约束和索引以及列存储索引。日期范围是从2015年至今。我需要每年的第一个和最后一个日期。 连续一年追踪
select min(date_created),max(date_created),count(*) from my_table
where datepart (year from date_created) = <4 digit year>;
在3秒钟内产生结果,然后进行以下查询:
`with years as (
select datepart (year from date_created) as yearz, count(*) as cnt
from my_table group by datepart (year from date_created)
)
select
years.yearz, min(date_created), max(date_created), years.cnt
from my_table a
inner join years on datepart(year from a.date_created) = years.yearz
group by years.yearz, years.cnt
order by years.yearz asc`
我需要4秒钟产生的结果!
是否有更好的方法可以使用Analytics / etc重新编写以上查询并更快地产生结果。
谢谢。
我尝试了
select
datepart (year from date_created) as year,
first_value(date_created) over (partition by datepart (year from date_created)
order by date_created ROWS UNBOUNDED PRECEDING) as first_date,
last_value(date_created) over (partition by datepart (year from date_created)
order by date_created ROWS UNBOUNDED PRECEDING) as last_date
from my_table
此程序运行了5分钟,最终被杀死。计划显示了针对列存储索引的行模式执行。