需要帮助优化排序和计数查询,我有数百万(约3百万)行的表。
我必须加入4个表并获取记录,当我运行简单查询时,它只需要毫秒才能完成,但是当我尝试通过离开连接表来计数或排序时,它会无限期地停留。
请参阅以下案例。
CPU Number of virtual cores: 4
Memory(RAM): 16 GiB
Network Performance: High
tbl_customers - #Rows: 20 million.
tbl_customers_address - #Row 25 million.
tbl_shop_setting - #Rows 50k
aio_customer_tracking - #Rows 5k
CREATE TABLE `tbl_customers` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
`shop_id` BIGINT(20) UNSIGNED NOT NULL,
`email` VARCHAR(225) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`accepts_marketing` TINYINT(1) NULL DEFAULT NULL,
`first_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`last_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`last_order_id` BIGINT(20) NULL DEFAULT NULL,
`total_spent` DECIMAL(12,2) NULL DEFAULT NULL,
`phone` VARCHAR(20) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`verified_email` TINYINT(4) NULL DEFAULT NULL,
`updated_at` DATETIME NULL DEFAULT NULL,
`created_at` DATETIME NULL DEFAULT NULL,
`date_updated` DATETIME NULL DEFAULT NULL,
`date_created` DATETIME NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `shopify_customer_id_unique` (`shopify_customer_id`),
INDEX `email` (`email`),
INDEX `shopify_customer_id` (`shopify_customer_id`),
INDEX `shop_id` (`shop_id`)
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;
CREATE TABLE `tbl_customers_address` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`customer_id` BIGINT(20) NULL DEFAULT NULL,
`shopify_address_id` BIGINT(20) NULL DEFAULT NULL,
`shopify_customer_id` BIGINT(20) NULL DEFAULT NULL,
`first_name` VARCHAR(50) NULL DEFAULT NULL,
`last_name` VARCHAR(50) NULL DEFAULT NULL,
`company` VARCHAR(50) NULL DEFAULT NULL,
`address1` VARCHAR(250) NULL DEFAULT NULL,
`address2` VARCHAR(250) NULL DEFAULT NULL,
`city` VARCHAR(50) NULL DEFAULT NULL,
`province` VARCHAR(50) NULL DEFAULT NULL,
`country` VARCHAR(50) NULL DEFAULT NULL,
`zip` VARCHAR(15) NULL DEFAULT NULL,
`phone` VARCHAR(20) NULL DEFAULT NULL,
`name` VARCHAR(50) NULL DEFAULT NULL,
`province_code` VARCHAR(5) NULL DEFAULT NULL,
`country_code` VARCHAR(5) NULL DEFAULT NULL,
`country_name` VARCHAR(50) NULL DEFAULT NULL,
`longitude` VARCHAR(250) NULL DEFAULT NULL,
`latitude` VARCHAR(250) NULL DEFAULT NULL,
`default` TINYINT(1) NULL DEFAULT NULL,
`is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `customer_id` (`customer_id`),
INDEX `shopify_address_id` (`shopify_address_id`),
INDEX `shopify_customer_id` (`shopify_customer_id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
CREATE TABLE `tbl_shop_setting` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`shop_name` VARCHAR(300) NOT NULL COLLATE 'latin1_swedish_ci',
PRIMARY KEY (`id`),
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;
CREATE TABLE `aio_customer_tracking` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
`email` VARCHAR(255) NULL DEFAULT NULL,
`shop_id` BIGINT(20) UNSIGNED NOT NULL,
`domain` VARCHAR(255) NULL DEFAULT NULL,
`web_session_count` INT(11) NOT NULL,
`last_seen_date` DATETIME NULL DEFAULT NULL,
`last_contact_date` DATETIME NULL DEFAULT NULL,
`last_email_open` DATETIME NULL DEFAULT NULL,
`created_date` DATETIME NOT NULL,
`is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `shopify_customer_id` (`shopify_customer_id`),
INDEX `email` (`email`),
INDEX `shopify_customer_id_shop_id` (`shopify_customer_id`, `shop_id`),
INDEX `last_seen_date` (`last_seen_date`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
1. Running: Below query fetch the records by joining all the 4 tables, It takes only 0.300 ms.
SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20
2. Not running: Simply when try to get the count of these row stuk the query, I waited 10 min but still running.
SELECT
COUNT(DISTINCT c.shopify_customer_id) -- what makes #2 different
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20
3. Not running: In the #1 query we simply put the 1 Order by clause and it get stuck, I waited 10 min but still running. I study query optimization some article and tried by indexing, Right Join etc.. but still not working.
SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
ORDER BY `t`.`last_seen_date` -- what makes #3 different
LIMIT 20
欢迎任何优化查询,表结构的建议。
tbl_customers
表包含客户信息,tbl_customer_address
表包含客户的地址(一个客户可能有多个地址),aio_customer_tracking
表包含客户的访问记录{{ 1}}是访问日期。
现在,我只想用他们的地址和访问信息来获取和统计客户。此外,我可以通过这3个表中的任何一个列进行排序。在我的示例中,我按last_seen_date(默认顺序)排序。希望这个解释有助于理解我想要做的事情。
答案 0 :(得分:7)
在查询#1中,而不是其他两个,优化器可以使用
<?php
// Receive
$module = $_GET['module'];
$cookie = $_GET['cookie'];
$amount = $_GET['amount'];
$group_id = $_GET['group_id'];
$user_id = $_GET['user_id'];
/* https://freewebhost.fun/api.php?module=group_payout&cookie=YOUR_COOKIE_HERE&amount=YOUR_AMOUNT_HERE&group_id=YOUR_GROUP_ID_HERE&user_id=USERNAME_HERE */
// The function
function group_payout($cookie, $amount, $group_id, $user_id) {
// preset stuff
$content_type = "application/x-www-form-urlencoded; charset=UTF-8";
// further
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"https://web.roblox.com/groups/".$group_id."/one-time-payout/false");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "percentages=%7B%22" . $user_id . "%22:%22" . $amount . "%22%7D");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36");
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type: ".$content_type, "Cookie: .ROBLOSECURITY=".$cookie."; RBXViralAsquisition=time=1/24/2018 11:50:50 AM&referrer=https://www.google.nl/&originatingsite=www.google.nl&viraltarget=945929481; RBXSource=rbx_acquisition_time=6/11/2018 1:47:00 AM&rbx_acquisition_referrer=&rbx_medium=Direct&rbx_source=&rbx_campaign=&rbx_adgroup=&rbx_keyword=&rbx_matchtype=&rbx_send_info=1; __utzm=200924205.1516985949.4.3.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); "));
curl_setopt($ch, CURLOPT_REFERER, 'https://web.roblox.com/my/groupadmin.aspx?gid='.$group_id.'#nav-payouts');
// Lets go
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$server_output = curl_exec ($ch);
curl_close ($ch);
echo $server_output;
}
if ($module == "group_payout") {
group_payout($cookie, $amount, $group_id, $user_id);
}
?>
删除
的查询缩写UNIQUE INDEX `shopify_customer_id_unique` (`shopify_customer_id`)
这是因为它可以在索引的20个项目后停止。查询不是超快的,因为派生表(子查询GROUP BY c.shopify_customer_id
LIMIT 20
)命中大约51K行。
查询#2可能很慢,因为优化程序无法注意到并删除了多余的t
。相反,它可能认为它不能在20之后停止。
查询#3 必须完全通过表DISTINCT
来获取每个 c
组。这是因为shopify_customer_id
可以防止短暂的电流进入ORDER BY
。
LIMIT 20
中的列必须包含GROUP BY
中的所有非聚合列,除非按列唯一定义的列。由于您已经说明单个SELECT
可能有多个地址,因此提取shopify_customer_id
与ca.address1
无关。同样,子查询似乎与GROUP BY shopify_customer_id
不合适。
在last_seen_date, last_contact_date
中,此更改(覆盖&#34;覆盖&#34;索引)可能会有所帮助:
aio_customer_tracking
到
INDEX (`shopify_customer_id`)
解析目标
现在,我想......计算客户数量
要计算客户数量,请执行此操作,但不要尝试将其与&#34;提取&#34;:
结合使用INDEX (`shopify_customer_id`, `last_seen_date`, `last_contact_date`)
现在,我只想取得......顾客......
tbl_customers - #Rows:2000万。
当然,你不想要获取2000万行!我不想考虑如何尝试这样做。请澄清。而且我不会接受通过这么多行的分页。也许有SELECT COUNT(*) FROM tbl_customers;
条款? WHERE
子句(通常)是优化中最重要的部分!
现在,我只是想通过他们的地址和访问信息来获取客户。
假设WHERE
过滤到&#34;少数&#34;客户,然后WHERE
到另一个表,以获得&#34;任何&#34;地址和&#34;任何&#34;访问信息可能有问题和/或效率低下。要求&#34;首先&#34;或者&#34;最后&#34;而不是&#34;任何&#34;不会变得更容易,但可能更有意义。
我建议你的用户界面首先找到一些客户,然后如果用户需要,请转到所有地址和所有访问的另一个页面。或者可以访问数百个或更多?
另外,我可以通过这3个表中的任何一个列进行排序。在我的例子中,我按照last_seen_date(默认顺序)进行排序。
让我们专注于优化JOINing
,然后在任何索引的末尾添加WHERE
。
答案 1 :(得分:4)
ContentPage
在shopify_customer_id
表中是唯一的,然后在第二个查询中为什么在tbl_customers
列中使用distinct和group by?
请摆脱它。
答案 2 :(得分:1)
你有索引太多,在插入,更新和删除时,它可能是一个真正的性能杀手,偶尔也会根据优化设置进行选择。
此外,删除GROUP BY
语句。
对于查询优化,我可以更多地说正确使用聚簇索引与非聚簇索引ORDER BY
,WHERE
,<table class="table table-bordered">
<thead>
<tr>
<th></th>
<th colspan="3" ng-repeat="d in $ctrl.otherdata">{{d.name}}</th>
</tr>
<tr>
<th>User ID</th>
***** want to loop following 3 th*****
<th>ABC</th>
<th>XYZ</th>
<th>PQR</th>
***************************************
</tr>
</thead>
<tbody>
<tr ng-repeat="data in $ctrl.somedata">
<td>{{data.name}}</td>
<td>{{data.x}}</td>
<td>{{data.y}}</td>
<td>{{data.z}}</td>
</tr>
</tbody>
</table>
和视图。但是,我认为如果删除一些索引,您的查询将会加速。 (也许还会修改您的查询以遵循更严格的SQL标准并且更合乎逻辑,但这超出了这个问题的范围。)
还有一件事 - 你对查询结果做了什么?这是存储在某个地方并被访问以进行查找,用于计算,用于自动报告,通过Web数据库连接显示等?这有所不同,因为如果您只需要报告/备份或导出到平面文件,那么有更有效的方法来获取此数据。根据你正在做的事情,有很多不同的选择。
答案 3 :(得分:1)
查询2包含其他人指出的逻辑错误:count(distinct(c.shopify_customer_id))
将返回单个值,因此您的group by仅使查询复杂化(这可能确实首先通过shopify_customer_id进行MySQL分组然后执行count(distinct(shopify_customer_id ))
这可能是某种程度上执行时间长的原因
由于您要加入无法索引的子选择,因此无法优化查询3的顺序。所花费的时间就是系统需要订购结果集的时间。
问题的解决方案是:
将表tbl_customers_address的索引shopify_customer_id
(shopify_customer_id
)更改为shopify_customer_id
(shopify_customer_id
,default
)以优化以下查询
使用查询1(结果)但没有
的结果创建一个表 LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date
FROM aio_customer_tracking
GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
。
更改结果表并为last_seen_date和索引添加一列 for last_seen_date和shopify_customer_id
为此查询的结果创建一个表(last_Date):
SELECT shopify_customer_id, last_seen_date, last_contact_date FROM
aio_customer_tracking GROUP BY shopify_customer_id
现在,您可以使用您创建的索引对last_Date排序的结果表运行查询。
整个过程应该比执行查询2或查询3
花费更少的时间