如何优化这个SELECT?

时间:2016-04-20 09:49:25

标签: mysql sql ruby-on-rails activerecord

我有一对多的表PaymentPaymentFlows来跟踪付款工作流程。

对于不同的经理,他们只对某些工作流程感兴趣。因此,只要付款到达某个工作流程,就会向他们提供一个列表。

例如,

 Payment 1 - A) Apply
             B) Checked
             C) Approved by Manager
             D) Approved by CFO
             E) Cheque issued

 Payment 2 - A) Apply
             B) Checked
             C) Approved by Manager

 Payment 3 - A) Apply
             B) Checked
             C) Approved by Manager

 Payment 4 - A) Apply
             B) Checked

要显示工作流C的所有付款,我所做的是:

class Payment < ActiveRecord::Base

  def self.search_by_workflow(flow_code)
    self.find_by_sql("SELECT * FROM payments P INNER JOIN (
        SELECT payment_id FROM (
          SELECT * FROM (
            SELECT * FROM payment_flows F
            ORDER BY F.payment_flow_id DESC
          ) latest GROUP BY payment_id
        ) flows WHERE flows.code = flow_code)
      ) IDs ON IDs.payment_id = P.payment_id ORDER BY P.payment_id DESC LIMIT 100;")
  end

end

这样:

@payments = Payment.search_by_workflow('Approved by Manager')

返回:Payment 23

但是,性能不是很好(15,000次付款和55,000次工作流程为5到7秒)。

如何改善表现?

更新(使用表格结构):

CREATE TABLE `payments` (
  `payment_id` int(11) NOT NULL,
  `payment_type_code` varchar(50) default 'PETTY_CASH',
  `status` varchar(16) NOT NULL default '?',
  PRIMARY KEY  (`payment_id`),
  KEY `status` (`status`),
  KEY `payment_type_code` (`payment_type_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `payment_flows` (
  `payment_flow_id` int(11) NOT NULL,
  `payment_id` int(11) default NULL,
  `code` varchar(64) default NULL,
  `status` varchar(255) NOT NULL default 'new',
  PRIMARY KEY  (`payment_flow_id`),
  KEY `payment_id` (`payment_id`),
  KEY `code` (`code`),
  KEY `status` (`status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

更新(使用name_scope):

named_scope :by_workflows, lambda { |workflows| { :conditions =>  [ "EXISTS (
          SELECT 'FLOW'
          FROM payment_flows pf
          WHERE pf.payment_id = payments.payment_id
          AND pf.proc_code IN (:flows)
          AND NOT EXISTS (
              SELECT 'OTHER'
              FROM payment_flows pfother
              WHERE pfother.payment_id = pf.payment_id
              AND pfother.payment_flow_id > pf.payment_flow_id
          )
      )", { :flows => workflows } ]}
    }

为方便起见,例如:

Payment.by_workflows(['Approved by Manager', 'Approved by CFO']).count

2 个答案:

答案 0 :(得分:1)

试试这个:

SELECT * FROM payment p
WHERE EXISTS(
    SELECT 'FLOW'
    FROM payment_flows pf
    WHERE pf.payment_id = p.payment_id
    AND pf.code = flow_code 
    AND NOT EXISTS(
        SELECT 'OTHER'
        FROM payment_flows pf2
        WHERE pf2.payment_id = pf.payment_id
        AND pf2.payment_flow_id > pf.payment_flow_id
    )
)

注意:在查询中,flow_code是一个包含您要搜索的代码的变量

我已经添加了一个关于flow_code存在的主要EXISTS条件和一个嵌套的NOT EXISTS条件,该条件是关于flow_code接下来没有其他同一付款的ID。

告诉我是否可以提高性能。

答案 1 :(得分:0)

看起来你正在定义&#34;最新&#34;给定付款的payment_flows是具有最大值payment_flow_id的行。

为了获得更好的性能,如果您可以在payment_flow_id上替换几个索引

添加这些索引

  ... ON payment_flow_id(code,payment_id,payment_flow_id)
  ... ON payment_flow_id(payment_id,payment_flow_id) 

并删除这些(现在是多余的)索引

  ... ON payment_flow_id(code) 
  ... ON payment_flow_id(payment_id)

我建议这个查询:

  SELECT p.*
    FROM payments p
    JOIN ( SELECT c.payment_id
                , MAX(c.payment_flow_id) AS flow_id
             FROM payment_flows c
            WHERE c.code =   :flow_code       /* <-- query parameter */
            GROUP BY c.payment_id
            ORDER BY c.code DESC, c.payment_id DESC 
         ) d
      ON d.payment_id = p.payment_id
    LEFT
    JOIN payment_flows n
      ON n.payment_id      = d.payment_id
     AND n.payment_flow_id > d.payment_flow_id
   WHERE n.payment_id IS NULL
   ORDER BY d.payment_id DESC
   LIMIT 100

内联视图查询&#34; d&#34;获取指定代码(:flow_code )的payment_flow_id(如果有),因此它仅返回处理流程中至少那么远的付款。

该查询使用反连接模式来排除具有&#34;稍后&#34;的payment_flow_id的行。而不是指定代码的那个。

反连接是一个外连接,用于返回左侧的所有行以及右侧的匹配行,WHERE子句中的条件排除了具有匹配行的所有行。 (注意不等式比较,只有具有&#34;之后&#34; payment_flow_id值的行才匹配。)

不能保证这会更快。

但是建议的索引改进,它应该让你看起来很好看EXPLAIN输出。 (使用EXPLAIN可以很好地处理查询将使用的访问计划。)