我已在我的应用程序中定义了以下表格,以根据培训日期获取每个区域的报告。
wi_individual_g(ind_id, ind_district_id, ...)
wi_individual_p(ind_id,prg_id, ind_dalit (yes/no), ind_madhesi (yes/no), ...)
wi_training(trn_id, trn_start_date, trn_ben_type, ...)
wi_indv_training(trn_id, ind_id)
wi_district(dst_id,dst_name)
我的问题:必须生成报告,以计算与给定 trn_start_date 之间的培训相关联的区域人员。该应用程序具有预定义的日期范围,其中的季度定义如下:
$quarter=array('y1q3'=>array('2013-02-01','2013-03-31'),'y1q4'=>array('2013-04-01','2013-06-30')
,'y2q1'=>array('2013-07-01','2013-09-30'),'y2q2'=>array('2013-10-01','2013-012-31'),'y2q3'=>array('2014-01-01','2014-03-31'),'y2q4'=>array('2014-04-01','2014-06-30')
,'y3q1'=>array('2014-07-01','2014-09-30'),'y3q2'=>array('2014-10-01','2014-012-31'),'y3q3'=>array('2015-01-01','2015-03-31'),'y3q4'=>array('2015-04-01','2015-06-30')
,'y4q1'=>array('2015-07-01','2015-09-30'),'y4q2'=>array('2015-10-01','2015-012-31'),'y4q3'=>array('2016-01-01','2016-03-31'),'y4q4'=>array('2016-04-01','2016-06-30')
,'y5q1'=>array('2016-07-01','2016-09-30'),'y5q2'=>array('2016-10-01','2016-012-31'),'y5q3'=>array('2017-01-01','2017-03-31'),'y5q4'=>array('2017-04-01','2017-06-30')
,'y6q1'=>array('2017-07-01','2017-09-30'),'y6q2'=>array('2017-10-01','2017-012-31'),'y6q3'=>array('2018-01-01','2018-03-31'),'y6q4'=>array('2018-04-01','2018-06-30'));
如果选择 trn_start_date 作为Y4Q4,则查询必须按区域计算每个日期范围的个体:Y1(Q1-Q4),Y2(Q2-Q4),Y3(Q1 -Q4),Y4(Q1-Q4)单独查询为:
Y1 Y2 Y3 Y4 Y5 Y6
8 3948 3511 0 0 0
作为解决方案,我应用了以下查询:
SELECT wi_district.dst_name,
COUNT(DISTINCT(CASE WHEN wi_training.trn_start_date BETWEEN '2017-07-01' AND '2018-06-30' AND
ind_dalit='yes' THEN wi_individual_g.ind_id END)) AS y6 ,
COUNT(DISTINCT(CASE WHEN wi_training.trn_start_date BETWEEN '2016-07-01' AND '2017-06-30' AND ind_dalit='yes' THEN wi_individual_g.ind_id END)) AS y5 ,
COUNT(DISTINCT(CASE WHEN wi_training.trn_start_date BETWEEN '2015-07-01' AND '2016-06-30' AND ind_dalit='yes' THEN wi_individual_g.ind_id END)) AS y4 ,
COUNT(DISTINCT(CASE WHEN wi_training.trn_start_date BETWEEN '2014-07-01' AND '2015-06-30' AND ind_dalit='yes' THEN wi_individual_g.ind_id END)) AS y3 ,
COUNT(DISTINCT(CASE WHEN wi_training.trn_start_date BETWEEN '2013-07-01' AND '2014-06-30' AND ind_dalit='yes' THEN wi_individual_g.ind_id END)) AS y2 ,
COUNT(DISTINCT(CASE WHEN wi_training.trn_start_date BETWEEN '2013-02-01' AND '2013-06-30' AND ind_dalit='yes' THEN wi_individual_g.ind_id END)) AS y1
FROM wi_individual_g
INNER JOIN wi_individual_p ON wi_individual_p.ind_id=wi_individual_g.ind_id AND wi_individual_g.ind_is_recepient='yes'
INNER JOIN wi_district ON wi_district.dst_id=wi_individual_g.ind_district_id AND wi_individual_g.ind_deleted=0
INNER JOIN wi_indv_training ON wi_indv_training.ind_id=wi_individual_g.ind_id AND wi_indv_training.is_deleted=0
INNER JOIN wi_training ON wi_training.trn_id=wi_indv_training.trn_id AND wi_training.trn_deleted=0 AND wi_training.trn_beneficiary_type=2 AND wi_training.trn_start_date <='2018-06-30'
GROUP BY wi_district.dst_name
但是这个查询执行时间超过5分钟,这是最糟糕的。我还在字段上应用了索引,但实现了相同的结果。 如果有人为我提供最佳解决方案,我将感激不尽。
答案 0 :(得分:0)
我略微改变了查询,将标准调整到各自的标准 在适用的情况下加入或加入WHERE子句。我也感动了“ind_dalit = yes” 在每个case语句中,JOIN到wi_individual_p表的组件。
有了这个,我可以更好地看到提供索引建议的标准,包括
table index
wi_individual_g ( ind_is_recipient, ind_deleted, ind_id, ind_district_id )
wi_individual_p ( ind_id, ind_dalit )
wi_district ( dst_id, dst_name )
wi_indv_training ( ind_id, is_deleted )
wi_training ( trn_beneficiary_type, trn_deleted, trn_start_date, trn_id )
SELECT
d.dst_name,
COUNT( DISTINCT( CASE WHEN t.trn_start_date
BETWEEN '2017-07-01' AND '2018-06-30'
THEN g.ind_id END)) AS y6,
COUNT( DISTINCT( CASE WHEN t.trn_start_date
BETWEEN '2016-07-01' AND '2017-06-30'
THEN g.ind_id END)) AS y5,
COUNT( DISTINCT( CASE WHEN t.trn_start_date
BETWEEN '2015-07-01' AND '2016-06-30'
THEN g.ind_id END)) AS y4,
COUNT( DISTINCT( CASE WHEN t.trn_start_date
BETWEEN '2014-07-01' AND '2015-06-30'
THEN g.ind_id END)) AS y3,
COUNT( DISTINCT( CASE WHEN t.trn_start_date
BETWEEN '2013-07-01' AND '2014-06-30'
THEN g.ind_id END)) AS y2,
COUNT( DISTINCT( CASE WHEN t.trn_start_date
BETWEEN '2013-02-01' AND '2013-06-30'
THEN g.ind_id END)) AS y1
FROM
wi_individual_g g
INNER JOIN wi_individual_p p
ON g.ind_id = p.ind_id
AND p.ind_dalit='yes'
INNER JOIN wi_district d
ON g.ind_district_id = d.dst_id
INNER JOIN wi_indv_training wit
ON g.ind_id = wit.ind_id
AND wit.is_deleted = 0
INNER JOIN wi_training t
ON wit.trn_id = t.trn_id
AND t.trn_beneficiary_type = 2
AND t.trn_deleted = 0
AND t.trn_start_date >= '2013-02-01'
AND t.trn_start_date <= '2018-06-30'
WHERE
g.ind_is_recepient = 'yes'
AND g.ind_deleted = 0
GROUP BY
d.dst_name
这是您可能尝试的另一种选择。这个预先查询(别名PQ)不同的“g”区和ind_id每个日期组1-6 vs返回每个日期记录。然后结果是每个区域的简单总和。
SELECT
d.dst_name,
SUM( PQ.DateGrp = 6 ) AS y6,
SUM( PQ.DateGrp = 5 ) AS y5,
SUM( PQ.DateGrp = 4 ) AS y4,
SUM( PQ.DateGrp = 3 ) AS y3,
SUM( PQ.DateGrp = 2 ) AS y2,
SUM( PQ.DateGrp = 1 ) AS y1
FROM
( select distinct
g.ind_district_id,
g.ind_id,
CASE WHEN t.trn_start_date BETWEEN '2017-07-01' AND '2018-06-30' THEN 6
WHEN t.trn_start_date BETWEEN '2016-07-01' AND '2017-06-30' THEN 5
WHEN t.trn_start_date BETWEEN '2015-07-01' AND '2016-06-30' THEN 4
WHEN t.trn_start_date BETWEEN '2014-07-01' AND '2015-06-30' THEN 3
WHEN t.trn_start_date BETWEEN '2013-07-01' AND '2014-06-30' THEN 2
WHEN t.trn_start_date BETWEEN '2013-02-01' AND '2013-06-30' THEN 1
ELSE 0 END DateGrp
from
wi_training t
JOIN wi_indv_training wit
ON t.trn_id = wit.trn_id
AND wit.is_deleted = 0
JOIN wi_individual_g g
g.ind_is_recepient = 'yes'
AND g.ind_deleted = 0
AND wit.ind_id = g.ind_id
INNER JOIN wi_individual_p p
ON g.ind_id = p.ind_id
AND p.ind_dalit='yes'
where
t.trn_beneficiary_type = 2
AND t.trn_deleted = 0
AND t.trn_start_date >= '2013-02-01'
AND t.trn_start_date <= '2018-06-30' ) PQ
INNER JOIN wi_district d
ON PQ.ind_district_id = d.dst_id
GROUP BY
d.dst_name
答案 1 :(得分:0)
我找到了将性能提高3倍的方法:
At first : the query took around 128 secs
After suggestion: the query took around 78 secs
Further modification: the query took around 23 secs
---------------------------------------------------------------------------------
SELECT d.dst_name,
COUNT(DISTINCT(CASE WHEN a.trn_start_date BETWEEN '2014-07-01' AND '2015-06-30' THEN a.ind_id END)) AS y3 ,
COUNT(DISTINCT(CASE WHEN a.trn_start_date BETWEEN '2013-07-01' AND '2014-06-30' THEN a.ind_id END)) AS y2 ,
COUNT(DISTINCT(CASE WHEN a.trn_start_date BETWEEN '2013-02-01' AND '2013-06-30' THEN a.ind_id END)) AS y1
FROM
(
SELECT g.ind_district_id,g.ind_id,t.trn_start_date,t.trn_beneficiary_type
FROM wi_individual_g g
INNER JOIN wi_indv_training wit ON g.ind_id = wit.ind_id AND wit.is_deleted = 0 AND g.ind_deleted=0 AND g.ind_is_recepient='yes'
INNER JOIN wi_training t ON wit.trn_id = t.trn_id AND t.trn_beneficiary_type=2 AND t.trn_deleted = 0
) a
INNER JOIN wi_individual_p p ON p.ind_id=a.ind_id
INNER JOIN wi_district d ON d.dst_id=a.ind_district_id
WHERE p.ind_dalit='yes'
GROUP BY d.dst_name;
总的来说,性能比我之前的查询增加了6倍。感谢您提出建议@DRapp
如果有人有最佳解决方案来提高性能,我要感谢他!