我有一个包含20多个列,4500万行的表。我希望通过分区汇总每个ID的信息,以使行的数量保持不变,而每行的信息仍然保留
select min(Distance) over(partition by Id) as min_distance
, max(Distance) over(partition by Id) as max_distance
, avg(Distance) over(partition by Id) as mean_distance
, stdev(Distance) over(partition by Id) as sd_distance
, sum(Distance) over(partition by Id) as sum_distance
, min(Speed) over(partition by Id) as min_speed
, max(Speed) over(partition by Id) as max_speed
, avg(Speed) over(partition by Id) as mean_speed
, stdev(Speed) over(partition by Id) as sd_speed
仅10000行的测试已经运行了2个小时。我想知道我们是否可以采取一些措施来改善性能。
答案 0 :(得分:1)
为什么不只是:
select Id, MIN(Distance) as min_distance
, max(Distance) as max_distance
, avg(Distance) as mean_distance
, stdev(Distance) as sd_distance
, sum(Distance) as sum_distance
, min(Speed) as min_speed
, max(Speed) as max_speed
, avg(Speed) as mean_speed
, stdev(Speed) as sd_speed
FROM mytable
GROUP BY id