以数百万行的一对一关系检索最后一行

时间:2015-02-19 06:25:57

标签: mysql sql database performance yii

很少发生的事情是一对一,其中第二个表可以为第一个表创建数百万个结果。例如,我有一个' radcliente'有数百万' radacct'但需要仅使用最后一个acct进行过滤的表格。以下是更好解释的示例:

这是标准:

$criteria = new CDbCriteria();
$criteria->with = [
    'acct', // slow because it will take millions of lines to have only the last
];
$criteria->together = true;
$clientes = Cliente::model()->findAll($criteria);

这是由Yii生成的查询(非常慢,超过40秒,它返回数百万行以仅在AR中使用一个):

SELECT
  `t`.`id` AS `t0_c0`,
  -- ...
  `t`.`spc_serasa` AS `t0_c56`,
  `acct`.`radacctid` AS `t1_c0`,
  -- ...
  `acct`.`cliente_id` AS `t1_c27`
FROM
  `radcliente` `t`
  LEFT OUTER JOIN `radacct` `acct` ON (`acct`.`cliente_id`=`t`.`id`)
ORDER BY
  radacctid DESC

将我的解决方案限制连接应用到一行(这很快!200ms - ):

SELECT
  `t`.`id` AS `t0_c0`,
  ..
  `t`.`spc_serasa` AS `t0_c56`,
  `acct`.`radacctid` AS `t1_c0`,
  -- ...
  `acct`.`cliente_id` AS `t1_c27`
FROM
  `radcliente` `t`
  LEFT OUTER JOIN `radacct` `acct` ON (
    acct.radacctid = (
      SELECT   radacctid
      FROM     `radacct` `acct`
      WHERE    (acct.cliente_id = t.id)
      ORDER BY radacctid DESC
      LIMIT 1
    )
  )

这是CActiveDataProvider生成的查询总计项目数,我的限制连接解决方​​案为1(慢,10秒计数):

SELECT
  COUNT(*)
FROM (
  SELECT
    `t`.`id` AS `t0_c0`,
    -- ...
    `t`.`spc_serasa` AS `t0_c56`,
    `endereco_instalacao`.`id` AS `t1_c0`,
    `telefones`.`id` AS `t2_c0`,
    `telefones`.`telefone` AS `t2_c3`,
    `emails`.`id` AS `t3_c0`,
    `emails`.`email` AS `t3_c3`,
    `metodo_cobranca`.`id` AS `t4_c0`,
    `acct`.`radacctid` AS `t5_c0`,
    `acct`.`framedipaddress` AS `t5_c22`
  FROM
    `radcliente` `t`
    LEFT OUTER JOIN `radcliente_endereco_instalacao` `endereco_instalacao` ON ( 
      endereco_instalacao.id = (
        SELECT id
        FROM `radcliente_endereco_instalacao` `endereco_instalacao`
        WHERE (
          endereco_instalacao.cliente_id = t.id
        )
        LIMIT 1
      )
    )
    LEFT OUTER JOIN `radcliente_telefone` `telefones` ON (`telefones`.`cliente_id`=`t`.`id`)
    LEFT OUTER JOIN `radcliente_email` `emails` ON (`emails`.`cliente_id`=`t`.`id`)
    LEFT OUTER JOIN `radmetodo_cobranca` `metodo_cobranca` ON (
      metodo_cobranca.id = (
        SELECT id
        FROM   `radmetodo_cobranca` `metodo_cobranca`
        WHERE  (metodo_cobranca.cliente_id = t.id)
               AND (metodo_cobranca.arquivo = 'nao')
        ORDER BY metodo_cobranca.id DESC
        LIMIT 1
      )
    )
    LEFT OUTER JOIN `radacct` `acct` ON (
      acct.radacctid = (
        SELECT   radacctid
        FROM     `radacct` `acct`
        WHERE    (acct.cliente_id = t.id)
        ORDER BY radacctid DESC
        LIMIT 1
     )
  )
  GROUP BY t.id
) sq

但问题在于CActiveDataProvider生成的计数(返回结果大约10秒)会有一种优化方式而不必丢失关系(因为我需要在将来按关系过滤)?

更新

感谢您的回复。我一直在做一些测试,并注意到在所有情况下都很慢,表格' radacct'通过其大小加剧问题,因此不应限制子查询中的1。如果您需要进行身份验证,请按照模型和链接访问系统:

访问:

http://177.86.111.30/dev2/teste

用户名:help

密码:1

下载radcliente和radacct的模型和架​​构:http://177.86.111.30/files.zip

1 个答案:

答案 0 :(得分:0)

而不是ON id = ( SELECT ... LIMIT 1 )尝试添加另一个JOIN(不是LEFT JOIN):

JOIN ( SELECT ... LIMIT 1 ) x ON ...

我对你的代码的恐惧是它会在需要检查ON子句时反复评估该子查询。我的重写将导致子查询只发生一次。

您的查询看起来像是一个“相关”子查询,因此如果可能,您需要将其重新定义为非相关。