我有一个MySQL数据库。采集的数据存储在raw_data_headers
,raw_data_rows
和raw_data_row_details
表中。
raw_data_row_details
具有引用raw_data_rows.ID
的外键,与raw_data_rows
和raw_data_headers
相同。
在raw_data_headers
中存储了数据头,在raw_data_rows
中存储了采集程序的每个阶段,在raw_data_row_details
中存储了采集程序的每个阶段的详细信息。
这是查询:
SELECT
q1.ProcessTypeID,
q1.TestTypeID,
q1.ComponentID,
q1.TestResultID,
COUNT(*) AS Counter
FROM (
SELECT
raw_data_headers.batch_id AS BatchID,
raw_data_test_outputs.test_output_type_id AS TestOutputTypeID,
raw_data_test_types.process_type_id AS ProcessTypeID,
raw_data_test_types.ID AS TestTypeID,
raw_data_row_details.component_id AS ComponentID,
raw_data_test_results.ID AS TestResultID
FROM raw_data_row_details
INNER JOIN raw_data_rows ON raw_data_rows.ID = raw_data_row_details.row_id
INNER JOIN raw_data_headers ON raw_data_headers.ID = raw_data_rows.header_id
INNER JOIN raw_data_test_results ON raw_data_test_results.ID = raw_data_row_details.Value
INNER JOIN raw_data_test_outputs ON raw_data_test_outputs.ID = raw_data_row_details.test_output_id
INNER JOIN raw_data_test_types ON raw_data_test_types.ID = raw_data_test_outputs.test_type_id
HAVING TestOutputTypeID = 2 AND BatchID = 1
) AS q1
GROUP BY q1.ProcessTypeID, q1.TestTypeID, q1.ComponentID, q1.TestResultID
raw_data_headers
有989'180个条目,row_data_rows
有2'967'540个条目,raw_data_row_details
有13'848'520条目。
子查询q1
大约需要3分钟,而最终查询大约需要25分钟。我认为重点在GROUP BY
中。
如何提高性能?
编辑1:
SELECT
gnuhmi.raw_data_test_types.process_type_id AS ProcessTypeID,
gnuhmi.raw_data_test_types.ID AS TestTypeID,
gnuhmi.raw_data_row_details.component_id AS ComponentID,
gnuhmi.raw_data_test_results.ID AS TestResultID,
COUNT(*) AS Counter
FROM gnuhmi.raw_data_row_details
INNER JOIN gnuhmi.raw_data_rows ON gnuhmi.raw_data_rows.ID = gnuhmi.raw_data_row_details.row_id
INNER JOIN gnuhmi.raw_data_headers ON gnuhmi.raw_data_headers.ID = gnuhmi.raw_data_rows.header_id
INNER JOIN gnuhmi.raw_data_test_results ON gnuhmi.raw_data_test_results.ID = gnuhmi.raw_data_row_details.Value
INNER JOIN gnuhmi.raw_data_test_outputs ON gnuhmi.raw_data_test_outputs.ID = gnuhmi.raw_data_row_details.test_output_id
INNER JOIN gnuhmi.raw_data_test_types ON gnuhmi.raw_data_test_types.ID = gnuhmi.raw_data_test_outputs.test_type_id
WHERE gnuhmi.raw_data_test_outputs.test_output_type_id = 2 AND gnuhmi.raw_data_headers.batch_id = 1
GROUP BY
gnuhmi.raw_data_test_results.ID,
gnuhmi.raw_data_row_details.component_id,
gnuhmi.raw_data_test_types.ID,
gnuhmi.raw_data_test_types.process_type_id
这是新查询,没有子查询和WHERE
。这样可以提高性能(感谢@Yogesh Sharma)。
这是raw_data_headers
结构:
CREATE TABLE `raw_data_headers` (
`ID` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Univocal record key',
`ProductID` int(11) NOT NULL COMMENT 'Product numeric ID',
`Datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Univocal record creation date',
`batch_id` int(11) DEFAULT NULL COMMENT 'Univocal batch key',
`RecipeName` varchar(80) DEFAULT NULL COMMENT 'Used recipe name',
`RecipeVersion` smallint(6) DEFAULT NULL COMMENT 'Used recipe version',
`process_result_id` smallint(6) DEFAULT NULL COMMENT 'Process result key',
`invalidated` tinyint(1) NOT NULL DEFAULT '0' COMMENT 'invalidation after counters reset',
PRIMARY KEY (`ID`),
KEY `FK_raw_data_headers_batches_ID` (`batch_id`),
KEY `FK_raw_data_headers_process_re` (`process_result_id`),
CONSTRAINT `FK_raw_data_headers_batches_ID` FOREIGN KEY (`batch_id`) REFERENCES `batches` (`ID`) ON UPDATE CASCADE,
CONSTRAINT `FK_raw_data_headers_process_re` FOREIGN KEY (`process_result_id`) REFERENCES `process_result` (`ID`) ON DELETE NO ACTION ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Stores raw data headers'
这是raw_dato_rows
:
CREATE TABLE `raw_data_rows` (
`ID` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Univocal record key',
`Datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Univocal record creation date',
`header_id` int(11) unsigned NOT NULL COMMENT 'Univocal raw data header key',
`process_type_id` smallint(6) NOT NULL COMMENT 'Univocal process type key',
`process_result_id` smallint(6) NOT NULL COMMENT 'Univocal process result key',
PRIMARY KEY (`ID`),
KEY `FK_raw_data_rows_header_id` (`header_id`),
KEY `FK_raw_data_rows_process_resu2` (`process_result_id`),
KEY `FK_raw_data_rows_process_resul` (`process_type_id`),
CONSTRAINT `FK_raw_data_rows_header_id` FOREIGN KEY (`header_id`) REFERENCES `raw_data_headers` (`ID`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_raw_data_rows_process_resu2` FOREIGN KEY (`process_result_id`) REFERENCES `process_result` (`ID`) ON DELETE NO ACTION ON UPDATE CASCADE,
CONSTRAINT `FK_raw_data_rows_process_resul` FOREIGN KEY (`process_type_id`) REFERENCES `process_types` (`ID`) ON DELETE NO ACTION ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=2967541 DEFAULT CHARSET=utf8 COMMENT='Stores row data rows'
最后是raw_data_row_details
一个:
CREATE TABLE `raw_data_row_details` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Univocal row detail key',
`row_id` int(11) unsigned NOT NULL COMMENT 'Univocal row key',
`test_output_id` int(11) NOT NULL COMMENT 'Univocal test output key',
`component_id` int(11) NOT NULL COMMENT 'The component that take the measurement',
`Value` double NOT NULL COMMENT 'Output value',
PRIMARY KEY (`ID`),
KEY `FK_raw_data_row_details_row_id` (`row_id`),
KEY `FK_raw_data_rows_raw_data_test_outputs_ID` (`test_output_id`),
KEY `raw_data_row_details_components_FK` (`component_id`),
CONSTRAINT `FK_raw_data_row_details_row_id` FOREIGN KEY (`row_id`) REFERENCES `raw_data_rows` (`ID`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_raw_data_rows_raw_data_test_outputs_ID` FOREIGN KEY (`test_output_id`) REFERENCES `raw_data_test_outputs` (`ID`) ON UPDATE CASCADE,
CONSTRAINT `raw_data_row_details_components_FK` FOREIGN KEY (`component_id`) REFERENCES `components` (`ID`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=13848521 DEFAULT CHARSET=utf8 COMMENT='Stores raw data rows details'
答案 0 :(得分:2)
您不需要使用subquery
,只需在where
中使用group by
子句:
SELECT raw_data_test_types.process_type_id AS ProcessTypeID,
raw_data_test_types.ID AS TestTypeID,
raw_data_row_details.component_id AS ComponentID,
raw_data_test_results.ID AS TestResultID, COUNT(*) AS Counter
FROM raw_data_row_details INNER JOIN
raw_data_rows
ON raw_data_rows.ID = raw_data_row_details.row_id INNER JOIN
raw_data_headers
ON raw_data_headers.ID = raw_data_rows.header_id INNER JOIN
raw_data_test_results
ON raw_data_test_results.ID = raw_data_row_details.Value INNER JOIN
raw_data_test_outputs
ON raw_data_test_outputs.ID = raw_data_row_details.test_output_id INNER JOIN
raw_data_test_types
ON raw_data_test_types.ID = raw_data_test_outputs.test_type_id
WHERE raw_data_headers.batch_id = 1 AND raw_data_test_outputs.test_output_type = 2
GROUP BY raw_data_test_types.process_type_id, raw_data_test_types.ID,
raw_data_row_details.component_id, raw_data_test_results.ID;
答案 1 :(得分:0)
添加索引。 TestOutputTypeID
和BatchID
需要覆盖,可能不需要覆盖。
要查看当前发生的情况,请在MySQL控制台中使用EXPLAIN
。您可能会看到正在进行全表扫描的指示,即联接类型被标记为ALL
。
通常情况下,查询优化器会对不同的查询使用相同的执行计划,例如通过扩展子查询,就好像您没有使用它一样。只有EXPLAIN
会告诉您什么。
以下是有关如何解释EXPLAIN
输出的文档:https://dev.mysql.com/doc/refman/8.0/en/explain-output.html
答案 2 :(得分:0)
HAVING TestOutputTypeID = 2 AND BatchID = 1
将其从HAVING
更改为WHERE
,并在每个列中都有索引。
还具有以下索引:
raw_data_row_details: (row_id)
raw_data_rows: (header_id)
raw_data_row_details: (test_output_id)
raw_data_test_outputs: (test_type_id)
从表名称中删除raw_data_
;它只会使查询混乱。
如果这些帮助还不够,请提供EXPLAIN SELECT ...
和SHOW CREATE TABLE
。