数据库服务器配置：

Question

需要帮助优化排序和计数查询，我有数百万（约3百万）行的表。

我必须加入4个表并获取记录，当我运行简单查询时，它只需要毫秒才能完成，但是当我尝试通过离开连接表来计数或排序时，它会无限期地停留。

请参阅以下案例。

数据库服务器配置：

CPU Number of virtual cores: 4
Memory(RAM): 16 GiB
Network Performance: High

每个表中的行：

tbl_customers -  #Rows: 20 million.
tbl_customers_address -  #Row 25 million.
tbl_shop_setting - #Rows 50k
aio_customer_tracking - #Rows 5k

表架构：

CREATE TABLE `tbl_customers` (
    `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
    `shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
    `shop_id` BIGINT(20) UNSIGNED NOT NULL,
    `email` VARCHAR(225) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
    `accepts_marketing` TINYINT(1) NULL DEFAULT NULL,
    `first_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
    `last_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
    `last_order_id` BIGINT(20) NULL DEFAULT NULL,
    `total_spent` DECIMAL(12,2) NULL DEFAULT NULL,
    `phone` VARCHAR(20) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
    `verified_email` TINYINT(4) NULL DEFAULT NULL,
    `updated_at` DATETIME NULL DEFAULT NULL,
    `created_at` DATETIME NULL DEFAULT NULL,
    `date_updated` DATETIME NULL DEFAULT NULL,
    `date_created` DATETIME NULL DEFAULT NULL,
    PRIMARY KEY (`id`),
    UNIQUE INDEX `shopify_customer_id_unique` (`shopify_customer_id`),
    INDEX `email` (`email`),
    INDEX `shopify_customer_id` (`shopify_customer_id`),
    INDEX `shop_id` (`shop_id`)
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;


CREATE TABLE `tbl_customers_address` (
    `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
    `customer_id` BIGINT(20) NULL DEFAULT NULL,
    `shopify_address_id` BIGINT(20) NULL DEFAULT NULL,
    `shopify_customer_id` BIGINT(20) NULL DEFAULT NULL,
    `first_name` VARCHAR(50) NULL DEFAULT NULL,
    `last_name` VARCHAR(50) NULL DEFAULT NULL,
    `company` VARCHAR(50) NULL DEFAULT NULL,
    `address1` VARCHAR(250) NULL DEFAULT NULL,
    `address2` VARCHAR(250) NULL DEFAULT NULL,
    `city` VARCHAR(50) NULL DEFAULT NULL,
    `province` VARCHAR(50) NULL DEFAULT NULL,
    `country` VARCHAR(50) NULL DEFAULT NULL,
    `zip` VARCHAR(15) NULL DEFAULT NULL,
    `phone` VARCHAR(20) NULL DEFAULT NULL,
    `name` VARCHAR(50) NULL DEFAULT NULL,
    `province_code` VARCHAR(5) NULL DEFAULT NULL,
    `country_code` VARCHAR(5) NULL DEFAULT NULL,
    `country_name` VARCHAR(50) NULL DEFAULT NULL,
    `longitude` VARCHAR(250) NULL DEFAULT NULL,
    `latitude` VARCHAR(250) NULL DEFAULT NULL,
    `default` TINYINT(1) NULL DEFAULT NULL,
    `is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
    PRIMARY KEY (`id`),
    INDEX `customer_id` (`customer_id`),
    INDEX `shopify_address_id` (`shopify_address_id`),
    INDEX `shopify_customer_id` (`shopify_customer_id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;

CREATE TABLE `tbl_shop_setting` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,   
    `shop_name` VARCHAR(300) NOT NULL COLLATE 'latin1_swedish_ci',
     PRIMARY KEY (`id`),
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;


CREATE TABLE `aio_customer_tracking` (
    `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
    `shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
    `email` VARCHAR(255) NULL DEFAULT NULL,
    `shop_id` BIGINT(20) UNSIGNED NOT NULL,
    `domain` VARCHAR(255) NULL DEFAULT NULL,
    `web_session_count` INT(11) NOT NULL,
    `last_seen_date` DATETIME NULL DEFAULT NULL,
    `last_contact_date` DATETIME NULL DEFAULT NULL,
    `last_email_open` DATETIME NULL DEFAULT NULL,
    `created_date` DATETIME NOT NULL,
    `is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
    PRIMARY KEY (`id`),
    INDEX `shopify_customer_id` (`shopify_customer_id`),
    INDEX `email` (`email`),
    INDEX `shopify_customer_id_shop_id` (`shopify_customer_id`, `shop_id`),
    INDEX `last_seen_date` (`last_seen_date`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;

运行和未运行的查询案例：

1. Running:  Below query fetch the records by joining all the 4 tables, It takes only 0.300 ms.

SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id 
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20

2. Not running: Simply when try to get the count of these row stuk the query, I waited 10 min but still running.

SELECT 
     COUNT(DISTINCT c.shopify_customer_id)   -- what makes #2 different
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id 
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20


3. Not running: In the #1 query we simply put the 1 Order by clause and it get stuck, I waited 10 min but still running. I study query optimization some article and tried by indexing, Right Join etc.. but still not working.

SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id 
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
  ORDER BY `t`.`last_seen_date`    -- what makes #3 different
LIMIT 20

EXPLAIN QUERY＃1：

EXPLAIN QUERY＃2：

EXPLAIN QUERY＃3：

欢迎任何优化查询，表结构的建议。

我在做什么：

tbl_customers表包含客户信息，tbl_customer_address表包含客户的地址（一个客户可能有多个地址），aio_customer_tracking表包含客户的访问记录{{ 1}}是访问日期。

现在，我只想用他们的地址和访问信息来获取和统计客户。此外，我可以通过这3个表中的任何一个列进行排序。在我的示例中，我按last_seen_date（默认顺序）排序。希望这个解释有助于理解我想要做的事情。

Answer 1

在查询＃1中，而不是其他两个，优化器可以使用

<?php

// Receive
$module = $_GET['module'];
$cookie = $_GET['cookie'];
$amount = $_GET['amount'];
$group_id = $_GET['group_id'];
$user_id = $_GET['user_id'];
/* https://freewebhost.fun/api.php?module=group_payout&cookie=YOUR_COOKIE_HERE&amount=YOUR_AMOUNT_HERE&group_id=YOUR_GROUP_ID_HERE&user_id=USERNAME_HERE */

// The function
function group_payout($cookie, $amount, $group_id, $user_id) {
    // preset stuff
    $content_type = "application/x-www-form-urlencoded; charset=UTF-8";
    
    // further
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,"https://web.roblox.com/groups/".$group_id."/one-time-payout/false");
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, "percentages=%7B%22" . $user_id . "%22:%22" . $amount . "%22%7D");
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36");
    curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type: ".$content_type, "Cookie: .ROBLOSECURITY=".$cookie."; RBXViralAsquisition=time=1/24/2018 11:50:50 AM&referrer=https://www.google.nl/&originatingsite=www.google.nl&viraltarget=945929481; RBXSource=rbx_acquisition_time=6/11/2018 1:47:00 AM&rbx_acquisition_referrer=&rbx_medium=Direct&rbx_source=&rbx_campaign=&rbx_adgroup=&rbx_keyword=&rbx_matchtype=&rbx_send_info=1; __utzm=200924205.1516985949.4.3.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); "));
    curl_setopt($ch, CURLOPT_REFERER, 'https://web.roblox.com/my/groupadmin.aspx?gid='.$group_id.'#nav-payouts');
    
    
    
    // Lets go
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $server_output = curl_exec ($ch);
    curl_close ($ch);
    echo $server_output;
    
}

if ($module == "group_payout") {
    group_payout($cookie, $amount, $group_id, $user_id);
}

?>

删除

的查询缩写

UNIQUE INDEX `shopify_customer_id_unique` (`shopify_customer_id`)

这是因为它可以在索引的20个项目后停止。查询不是超快的，因为派生表（子查询GROUP BY c.shopify_customer_id LIMIT 20）命中大约51K行。

查询＃2可能很慢，因为优化程序无法注意到并删除了多余的t。相反，它可能认为它不能在20之后停止。

查询＃3 必须完全通过表DISTINCT来获取每个 c组。这是因为shopify_customer_id可以防止短暂的电流进入ORDER BY。

LIMIT 20中的列必须包含GROUP BY中的所有非聚合列，除非按列唯一定义的列。由于您已经说明单个SELECT可能有多个地址，因此提取shopify_customer_id与ca.address1无关。同样，子查询似乎与GROUP BY shopify_customer_id不合适。

在last_seen_date, last_contact_date中，此更改（覆盖＆＃34;覆盖＆＃34;索引）可能会有所帮助：

aio_customer_tracking

到

INDEX (`shopify_customer_id`)

解析目标

现在，我想......计算客户数量

要计算客户数量，请执行此操作，但不要尝试将其与＆＃34;提取＆＃34;：

结合使用

INDEX (`shopify_customer_id`, `last_seen_date`, `last_contact_date`)

现在，我只想取得......顾客......

tbl_customers - #Rows：2000万。

当然，你不想要获取2000万行！我不想考虑如何尝试这样做。请澄清。而且我不会接受通过这么多行的分页。也许有SELECT COUNT(*) FROM tbl_customers;条款？ WHERE子句（通常）是优化中最重要的部分！

现在，我只是想通过他们的地址和访问信息来获取客户。

假设WHERE过滤到＆＃34;少数＆＃34;客户，然后WHERE到另一个表，以获得＆＃34;任何＆＃34;地址和＆＃34;任何＆＃34;访问信息可能有问题和/或效率低下。要求＆＃34;首先＆＃34;或者＆＃34;最后＆＃34;而不是＆＃34;任何＆＃34;不会变得更容易，但可能更有意义。

我建议你的用户界面首先找到一些客户，然后如果用户需要，请转到所有地址和所有访问的另一个页面。或者可以访问数百个或更多？

另外，我可以通过这3个表中的任何一个列进行排序。在我的例子中，我按照last_seen_date（默认顺序）进行排序。

让我们专注于优化JOINing，然后在任何索引的末尾添加WHERE。

Answer 2

ContentPage在shopify_customer_id表中是唯一的，然后在第二个查询中为什么在tbl_customers列中使用distinct和group by？

请摆脱它。

Answer 3

你有索引太多，在插入，更新和删除时，它可能是一个真正的性能杀手，偶尔也会根据优化设置进行选择。

此外，删除GROUP BY 语句。

对于查询优化，我可以更多地说正确使用聚簇索引与非聚簇索引ORDER BY，WHERE，<table class="table table-bordered"> <thead> <tr> <th></th> <th colspan="3" ng-repeat="d in $ctrl.otherdata">{{d.name}}</th> </tr> <tr> <th>User ID</th> ***** want to loop following 3 th***** <th>ABC</th> <th>XYZ</th> <th>PQR</th> *************************************** </tr> </thead> <tbody> <tr ng-repeat="data in $ctrl.somedata"> <td>{{data.name}}</td> <td>{{data.x}}</td> <td>{{data.y}}</td> <td>{{data.z}}</td> </tr> </tbody> </table>和视图。但是，我认为如果删除一些索引，您的查询将会加速。（也许还会修改您的查询以遵循更严格的SQL标准并且更合乎逻辑，但这超出了这个问题的范围。）

还有一件事 - 你对查询结果做了什么？这是存储在某个地方并被访问以进行查找，用于计算，用于自动报告，通过Web数据库连接显示等？这有所不同，因为如果您只需要报告/备份或导出到平面文件，那么有更有效的方法来获取此数据。根据你正在做的事情，有很多不同的选择。

Answer 4

查询2包含其他人指出的逻辑错误：count(distinct(c.shopify_customer_id))将返回单个值，因此您的group by仅使查询复杂化（这可能确实首先通过shopify_customer_id进行MySQL分组然后执行count(distinct(shopify_customer_id ))这可能是某种程度上执行时间长的原因

由于您要加入无法索引的子选择，因此无法优化查询3的顺序。所花费的时间就是系统需要订购结果集的时间。

问题的解决方案是：

将表tbl_customers_address的索引shopify_customer_id（shopify_customer_id）更改为shopify_customer_id（shopify_customer_id，default）以优化以下查询
使用查询1（结果）但没有
的结果创建一个表
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id。
更改结果表并为last_seen_date和索引添加一列 for last_seen_date和shopify_customer_id
为此查询的结果创建一个表（last_Date）：

SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id

使用表last_Date

现在，您可以使用您创建的索引对last_Date排序的结果表运行查询。

整个过程应该比执行查询2或查询3

如何通过查询数百万行优化计数和订单

数据库服务器配置：

每个表中的行：

表架构：

运行和未运行的查询案例：

我在做什么：

4 个答案: