我有一个具有570万行和1.9GB大小的MySQL InnoDB表:
+-------------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------+------+-----+---------+----------------+
| id | int(20) | NO | PRI | NULL | auto_increment |
| listing_id | int(20) | YES | | NULL | |
| listing_link | text | YES | | NULL | |
| transaction_title | text | YES | | NULL | |
| image_thumb | text | YES | | NULL | |
| seller_link | text | YES | | NULL | |
| seller_name | text | YES | | NULL | |
| sale_date | date | YES | | NULL | |
+-------------------+---------+------+-----+---------+----------------+
这是我3GB RAM服务器的my.ini设置:
key_buffer = 16M
max_allowed_packet = 16M
sort_buffer_size = 8M
net_buffer_length = 8K
read_buffer_size = 2M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 8M
log_error = "mysql_error.log"
innodb_autoinc_lock_mode=0
join_buffer_size = 8M
thread_cache_size = 8
thread_concurrency = 8
query_cache_size = 64M
query_cache_limit = 2M
ft_min_word_len = 4
thread_stack = 192K
tmp_table_size = 64M
innodb_buffer_pool_size = 2G
innodb_additional_mem_pool_size = 16M
innodb_log_file_size = 512M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 120
innodb_write_io_threads = 8
innodb_read_io_threads = 8
innodb_thread_concurrency = 16
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
当我运行下一个查询时,需要超过20分钟才能返回结果:
SELECT transaction_title,
listing_id,
seller_name,
Max(sale_date) AS sale_date,
Count(*) AS count
FROM sales_meta
WHERE `sale_date` BETWEEN '2017-06-06' AND '2017-06-06'
GROUP BY listing_id
HAVING Count(*) > 1
ORDER BY count DESC,
seller_name;
我已经做过一些研究,看来我需要添加一些索引来加快处理速度,但是我对如何处理感到困惑。有一些单列索引和一些多列索引,该怎么办?
为了使事情变得更复杂,我将需要定期对此表执行一些其他查询:
SELECT *
FROM sales_meta
WHERE ` sale_date `= '2017-06-06';
和
SELECT DISTINCT `seller_name`
FROM `sales_meta`;
这两个方法的工作量可能较少,但尽管可能,但尽管如此,我仍然需要对它们进行优化,尽管三个中的第一个查询是目前的头等大事。
答案 0 :(得分:1)
如果只需要一天的值,并且数据类型为date,则可以避免between子句,而使用=
SELECT transaction_title,
listing_id,
seller_name,
Max(sale_date) AS max_sale_date,
Count(*) AS count
FROM sales_meta
WHERE sale_date = str_to_date('2017-06-06', '%Y-%m-%d')
GROUP BY listing_id
HAVING Count(*) > 1
ORDER BY count DESC, seller_name;
并确保您在sale_date上有一个索引
答案 1 :(得分:0)
sale_date
上的索引绝对是您应该添加的内容,因为在使用sale_date
的问题中有几个查询GROUP BY
中使用的列我不是采用一次性添加所有索引的方法,而是选择增量方法并在添加每个索引后测量性能。
答案 2 :(得分:0)
INDEX(sale_date) -- very important for the first query
str_to_date('2017-06-06', '%Y-%m-%d') -- no better than '2017-06-06'
innodb_buffer_pool_size = 2G -- too big for your tiny RAM; change to 1G (swapping kills perf)
GROUP BY listing_id -- meaningless, since `listing_id` is unique; hence count is always 1
Prefer using an explicit list instead of `SELECT *`
SELECT DISTINCT `seller_name`
FROM `sales_meta`; -- needs INDEX(seller_name)
but `seller_name` needs to be a VARCHAR, not TEXT
进一步的证据表明str_to_date
无用:
mysql> SELECT STR_TO_DATE('2019-02-27', '%Y-%m-%d');
+---------------------------------------+
| STR_TO_DATE('2019-02-27', '%Y-%m-%d') |
+---------------------------------------+
| 2019-02-27 |
+---------------------------------------+