Goal:
I want to speed up a sql query of about a million rows of transaction data (order data). I've been able to reduce the time from 50 minutes (using temp tables) to 9 minutes using CROSS APPLY() (see query below). Is there a way I can eliminate using ROW_NUMBER() to find the highest dollar amount spend by a customer / year (group by customer, year)? ROW_NUMBER() can be computationally expensive. Additionally there are no indexes on these tables.
Code:
select z.string_customer_name, z.string_customer_region, z.string_industry_group,
z.string_city, z.string_state, z.string_country, z.string_booking_type,
z.string_sales_branch, z.string_sales_region, z.string_sales_area,
z.int_booking_year, z.float_sum_total, z.string_tpis_concat, z.string_groupby
from (
select #temp_00.*, ca_01.float_sum_total, ca_00.string_tpis_concat,
ROW_NUMBER() over (partition by #temp_00.string_groupby order by #temp_00.string_groupby,
ca_01.float_sum_total) as row_num
from #temp_00
cross apply(
select string_groupby, int_booking_year, sum(float_total) as float_sum_total
from #temp_00
group by string_groupby, int_booking_year
) as ca_01
cross apply(
select string_groupby, STRING_AGG(cast(string_customer_tpi
as varchar(max)), '|') as string_tpis_concat
from #temp_00
group by string_groupby
) as ca_00
where ca_00.string_groupby = #temp_00.string_groupby and
ca_01.string_groupby = #temp_00.string_groupby and
ca_01.int_booking_year = #temp_00.int_booking_year
) as z
where z.row_num = 1
Temp table columns:
string_customer_name -> 'customer name'
string_customer_tpi -> 'customer id'
string_customer_region -> 'customer region'
string_industry_group -> 'customer industry group'
string_city -> 'customer city'
string_state -> 'customer state'
string_country -> 'customer country'
string_booking_type -> 'order type'
string_sales_branch -> 'sales branch'
string_sales_region -> 'sales region'
string_sales_area -> 'sales area of the world'
int_booking_year -> 'order year'
float_total -> 'order total in dollars'
string_groupby -> 'concatenation of customer name, customer city, customer state,
customer country, customer industry group'
Execution Plan for posted query
The XML for the query is too large to post. Although the picture of the execution plan is small I the second post is where I think most of the time is at the Sort(). 60% (posted query is 79% cost while the data pull is 21%) of both the initial data pull and the posted query is in the Sort().
答案 0 :(得分:0)
我不确定,但是如果我了解您的操作,可以避免交叉应用。 这将有助于提高性能,但是由于我无权访问数据,因此您必须对其进行测试并查看。
因此,在将数据放入临时表之后,我将开始介绍。请尝试以下:-
;with TempWithSum as (
--get the sum partition by string_groupby, int_booking_year
select *,sum(float_total) over(partition by string_groupby, int_booking_year) as float_sum_total
from @temp_00
),NamesCat as(
--get all customer names grouped by string_groupby
select string_groupby, STRING_AGG(cast(string_customer_tpi as varchar(max)), '|') as string_tpis_concat
from @temp_00
group by string_groupby
),AllData as(
--get the row number partition string_groupby and ordered by string_groupby, float_sum_total
select string_customer_name, string_customer_region, string_industry_group, string_city, z.string_state,
string_country, string_booking_type, string_sales_branch, string_sales_region, string_sales_area,
int_booking_year, float_sum_total, string_tpis_concat, string_groupby
,ROW_NUMBER() over (partition by string_groupby order by string_groupby, float_sum_total) as row_num
from TempWithSum z
inner join NamesCat on NamesCat.string_groupby=TempWithSum.string_groupby
)
select * from AllData where row_num=1
希望它能正常工作,并在预期的时间范围内提供所需的结果。
注意:我知道您想消除行号,并且我建议消除交叉应用,但是您的目标是性能。