Question

我有一个包含数亿行的MySQL表。请参阅下面的create语句：

 CREATE TABLE `transaction_history` (
  `transaction_history_id` int(11) NOT NULL AUTO_INCREMENT,
  `type_id` int(11) NOT NULL,
  `sub_type_id` int(11) DEFAULT NULL,
  `transaction_id` int(11) DEFAULT NULL,
  `settlement_date_time` datetime DEFAULT NULL,
  PRIMARY KEY (`transaction_history_id`),
  UNIQUE KEY `transaction_history_id_UNIQUE` (`transaction_history_id`),
  KEY `type_id_idx` (`type_id`),
  KEY `sub_type_id_idx` (`sub_type_id_id`),
  KEY `transaction_id_idx` (`ufmid`),
  KEY `settlement_date` (`settlement_date_time`),
  KEY `type_sub_type` (`type_id`,`sub_type_id`)
) ENGINE=InnoDB AUTO_INCREMENT=36832823 DEFAULT CHARSET=latin1;

桌子上的信息：每个transaction_id有多个settlement_date_times。 type_id和sub_type_id一起是唯一的

查询我需要创建：对于每个transaction_id，我需要获取最新的结算日期时间，然后计算（type_id和sub_type）的数量。

所以结果看起来像这样：

(type_id,sub_type_id) -> count 
(3,4) -> 23500
(2,2) -> 569323
(2,3) -> 45028
(3,2) -> 1038943

无论我做什么，我都无法创建一个运行速度相当快的查询。我创造的一切都在20分钟后超时。有没有办法在几分钟或几秒钟内运行此查询？

我尝试过的一个问题：

select count(a1.transaction_id), a1.type_id, a1.sub_type_id
from  transaction_history a1, transaction_history a2 
where a1.transaction_id= a2.transaction_id
and  not exists (Select a1.settlement_date_time < a2.settlement_date_time) 
group by a1.type_id, a1.sub_type_id

谢谢

Answer 1

试试这个。

select count(a1.transaction_id), a1.type_id, a1.sub_type_id  
from  transaction_history a1 join transaction_history a2 using(transaction_id)
where  a1.settlement_date_time > a2.settlement_date_time 
group by a1.type_id, a1.sub_type_id

希望这会有所帮助

Answer 2

我认为您需要一个子查询来首先找到每对的最新日期时间：

select count(hist.transaction_id), hist.type_id, hist.sub_type_id
from  transaction_history hist 
    (select type_id, sub_type_id, max(settlement_date_time) as max_dt 
    from transaction_history group by type_id, sub_type_id) latest_date 
on hist.type_id= latest_date.type_id AND 
 hist.sub_type_id=latest_date.sub_type_id AND 
 hist.settlement_date_time = latest_date.max_dt
group by hist.type_id, hist.sub_type_id

子查询找到每对的最新日期时间，然后联接查找主表中具有相同日期时间的记录。然后我们可以统计交易。

Answer 3

您没有提供DBMS使用的计划和数据分发的任何详细信息。

对于每个transaction_id，我需要获取最新的结算日期时间，然后计算（type_id和sub_type）的数量。

您是说需要查看每个type_id和每个子类型，还是仅查看唯一的组合？这是针对表中的每一行，还是仅针对每个transaction_id具有最新结算日期的行？如果是前者，那么您的索引会降低查询速度 - 全表扫描速度会更快。但是如果你想要合理的响应时间，那么你需要对数据进行非规范化。

您的表设计糟糕 - 索引type_id_idx是一个开销，在type_sub_type存在的情况下不会增加任何值。

Answer 4

除了上面提到的所有建议查询之外，我还想提出另外两条建议，以帮助您找到最优化的解决方案：

1）查看查询的“执行计划”。

在MySQL中，我们使用“ EXPLAIN ”命令，这使我们的计算变得更加容易。有关详细信息（https://dev.mysql.com/doc/refman/8.0/en/explain.html）

，请参阅此处

在MS SQL Server中，我们会执行类似 CTRL + SHIFT + ALT + L 或 CTRL + L 显示查询的执行计划（PS快捷方式可能因版本而异）。有关详细信息，请参阅以下内容（https://www.red-gate.com/simple-talk/sql/performance/execution-plan-basics/）。

2）如果仍然存在，我们无法得出单一答案，我们最简单的方法是通过启用某些分析工具来测试查询的所有建议/备用版本。

在 MySQL 中，我们可以使用“SHOW PROFILE”命令，它的工作原理如下;

SET profiling = 1;
。
。
我们的查询就在这里
。
。
显示简介;

有关详情（https://dev.mysql.com/doc/refman/5.6/en/show-profile.html）

，请参阅此处

或者，在 MS SQL 中我们可以设置“STATISTICS”和“TIME”选项，其工作原理如下：

SET STATISTICS IO ON
设置统计时间
。
。
我们的查询就在这里
。
。
SET STATISTICS IO OFF
SET STATISTICS TIME OFF

这将在消息窗口中为我们提供查询的运行时间/资源使用情况。

通过这样做，我们可以用最少的运行时间缩小查询范围。希望这可以帮助您完成任务的最优化查询。

Answer 5

尝试这个：

SELECT count(a1.transaction_id), a1.type_id, a1.sub_type_id
FROM (SELECT transaction_id,MAX(settlement_date_time) MAX_settlement_date_time FROM transaction_history GROUP BY transaction_id)maxdts
INNER JOIN transaction_history a1 ON a1.transaction_id= maxdts.transaction_id AND a1.settlement_date_time = maxdts.MAX_settlement_date_time
group by a1.type_id, a1.sub_type_id

在大型数据库中使用JOIN进行MySQL查询

5 个答案: