Question

我有两张桌子 t_productspecificprice有折扣和t_productcategory将productids与categoryids

进行对比

t_productspecificprice可以为一个productid添加多个折扣，但最近添加的折扣只是相关的。

t_productspecificprice有大约450000条记录和 t_productcategory有~350000 +条记录。

我需要针对特定productid的每个categoryid的最新折扣。

以下查询不起作用，phpmyadmin中的错误504。

查询：

select 
    a_categoryid as 'Category Id',
    t_productcategory.a_productid as 'Product Ids', 
    t_productspecificprice.a_reduction, 
    t_productspecificprice.a_reductiontype, 
    t_productspecificprice.a_to  

from t_productcategory

left join t_productspecificprice on t_productspecificprice.a_productid = t_productcategory.a_productid

left join 

(SELECT max(a_productspecificpriceid) as a_productspecificpriceid FROM t_productspecificprice 
    GROUP by a_productid
    ) 
as discounts on discounts.a_productspecificpriceid = t_productspecificprice.a_productspecificpriceid

where a_categoryid = 4

架构：

＆＃39;

解释信息：

有人可以优化吗？

Answer 1

您的查询使用以下外连接：

外部加入所有产品特定价格。
外部加入所有最新产品特定价格。

换句话说：

如果有特定产品的价格加入他们。（否则加入空记录。）
如果产品特定价格恰好是最新产品特定价格加入其ID。（否则加入NULL ID。）

所以无论是否是最新价格，您都可以保留所有记录。（好吧，这就是外连接应该做的事情。）

例如，您可以重新编写查询，以便在外部加入最新产品特定价格：

select 
  a_categoryid as 'Category Id',
  t_productcategory.a_productid as 'Product Ids', 
  t_productspecificprice.a_reduction, 
  t_productspecificprice.a_reductiontype, 
  t_productspecificprice.a_to  
from t_productcategory
left join t_productspecificprice 
  on t_productspecificprice.a_productid = t_productcategory.a_productid
  and a_productspecificpriceid in
  (
    select max(a_productspecificpriceid) 
    from t_productspecificprice 
    group by a_productid
  );

替代NOT EXISTS也值得一试：

select 
  a_categoryid as 'Category Id',
  t_productcategory.a_productid as 'Product Ids', 
  t_productspecificprice.a_reduction, 
  t_productspecificprice.a_reductiontype, 
  t_productspecificprice.a_to  
from t_productcategory
left join t_productspecificprice 
  on t_productspecificprice.a_productid = t_productcategory.a_productid
  and not exists
  (
    select *
    from t_productspecificprice newer
    where newer.a_productid = t_productspecificprice.a_productid
    and newer.a_productspecificpriceid > t_productspecificprice.a_productspecificpriceid
  );

Answer 2

我相信你应该试试这个

SELECT 
    pc.a_categoryid AS 'Category Id',
    pc.a_productid AS 'Product Ids', 
    psp.a_reduction, 
    psp.a_reductiontype, 
    psp.a_to,
    discounts.max_price_id  
FROM t_productcategory AS pc
LEFT JOIN t_productspecificprice AS psp 
    ON (psp.a_productid = pc.a_productid)
LEFT JOIN (
        SELECT a_productid, MAX(a_productspecificpriceid) AS max_price_id 
        FROM t_productspecificprice 
        GROUP BY a_productid
    ) AS discounts 
    ON discounts.max_price_id = psp.a_productspecificpriceid
WHERE pc.a_categoryid = 4

并在(a_productid, a_productspecificpriceid)表

中的t_productspecificprice上添加一个复合键

Answer 3

我的同事建议这个查询，它运行正常：

SELECT 
  s1.a_categoryid,
  p1.a_productspecificpriceid, 
  s1.a_productid,
  p1.a_reduction,
  p1.a_from,
  p1.a_to 
FROM t_productcategory s1 

LEFT JOIN t_productspecificprice p1 ON (s1.a_productid = p1.a_productid) 
LEFT JOIN t_productspecificprice p2 ON (p1.a_productid = p2.a_productid AND p1.a_productspecificpriceid < p2.a_productspecificpriceid) 

WHERE p2.a_productid IS NULL AND s1.a_categoryid = 4

Answer 4

这个非常具有挑战性的问题。我之前也有过这方面的经验，数据只有200k行。我的系统使用两个表连接通过简单的重演暂停。条件与您的条件和数据几乎相同。

如果在查询之前使用explain命令，mysql引擎将提供查询的执行计划。在那里你会发现mysql引擎必须分析的数据量是惊人的。不简单400k + 350k。请尝试以下命令。只需在上一次查询之前添加说明

explain your_query;

和

explain extended your_query;

然后，尝试在执行查询期间监视mysql进程使用的磁盘i / o，cpu和内存。然后你找到你的sql瓶脖子。例如，SATA Drive的共同性能为20-40MB / s。试着看看你的系统能做些什么。

这个现在称为大数据分析的域名。为了正确分析这种大联合的结果，我担心没有简单的解决方案。

这个大数据处理的主要问题是，当缓存查询中使用的所有密钥时，mysql引擎只是内存不足。因此，当发生这种情况时，mysql将内存与内存内存交换为硬盘。因此，需要添加更多处理。

该解决方案将涉及重组您的表或修改硬件或添加一些帮助表。

使用帮助器重演表。处理大数据行需要花费一些时间。您可能希望将查询分解为几个临时表，并用组中的结果填充它们。然后使用连接表的最终查询。例如，您可以使用tmp_recap_discount来填充最大折扣的结果。
```
insert into tmp_recap_discount 
SELECT 
  a_productid, 
  max(a_productspecificpriceid) as a_productspecificpriceid 
FROM    
  t_productspecificprice 
GROUP by a_productid
```
使用调度程序/作业来运行此查询，因为此分组将占用大量进程和时间，然后针对此表运行简单连接。如果查询也针对400k数据运行，则建议创建一个临时表来保存结果。因此排队一个sql作业列表来填充临时回顾表。创建一个互斥锁或标记来标记作业是否已完成，以便php应用程序只需查看最终表。没有简单的方法可以最大化大数据表的执行时间。即使使用where子句进行简单选择也需要很长时间。因此，建议使用本机/桌面应用程序或使用mysql命令直接运行慢查询。不推荐使用php执行这种慢速查询，即使你将php执行时间最多化为几天。讨厌的事情可能会发生。
安装mariadb。它是mysql的替代品。只需卸载mysql，但保留数据文件夹。然后在mysql安装上安装mariadb。如果要安全地播放它，请转储数据库，并在干净的mariadb安装中进行恢复。在我的情况下表现不同是非常重要的。执行时间缩短了300％以上。无需更改查询。我将所有的sistem数据库从mysql升级到mariadb，性能提升非常显着。但要小心，因为一些程序员经常使用讨厌的子查询，mariadb处理子查询的方式与mysql略有不同。因此必须彻底测试使用mysql的所有应用程序的输出。

玩你的硬件。优化设置。建议您先升级到mariaDB，然后再使用硬件和设置，因为改进就在那里。

一个。优化mysql设置。尝试在my.ini或my.cnf中找到这些设置。这些是基本的优化设置。

#default will be 128M, but you can increase safely around a quarter of system memory. 
#If you have 8Gb, then it is safe to assume 2048M for innodb buffer.
#The setting can be increased, just make sure, the system memory     have the amount free memory requested. 
#If not, it will be using memory swapping again, and the performance will bottleneck.
innodb_buffer_pool_size = 2G

#it will force the mysql engine to save your table(s) into different file(s) instead using just one giant file to store. 
#But if the previous setting is set to 0, you have to use a fresh mysql / mariadb install and restore the database for this setting to take effect.
innodb_file_pertable = 1

湾最大化磁盘io。要最大化磁盘io，只需使用更快的驱动器配置。它可能会升级到15k RPM SAS，SSD驱动器或SATA驱动器，SAS驱动器或SSD驱动器的RAID 0

℃。使用表分区。但这需要深入分析才能最大限度地提高性能。 https://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Mysql组通过查询优化获得400000+条记录

4 个答案: