我们使用MySQL数据库运行Java Web应用程序,并使用5.1 MySQL连接器和c3p0连接池进行连接。白天收集的数据将在一夜之间转化为数据仓库。应用程序和数据库都位于同一个EC2实例(m3.medium)上,内存为4GB,使用gp2(SSD)EBS卷。
将数据库从MySQL 5.5升级到MariaDB 10.0.17(也尝试过MySQL 5.6.23)会导致数据仓库每晚构建在某些条件下花费不合理的长时间。
数据仓库构建基于运行SQL查询和JRuby脚本的Maven / Ant脚本集。数据仓库(DWH)构建由CRON作业在一夜之间执行,并且在DWH构建运行时应用程序不会停止。 DWH构建首先删除整个dwh架构,然后重建它。
在这些条件下针对MySQL 5.5运行DWH构建需要大约4个小时,而对于相同的数据,对于MariaDB 10(和Mysql 5.6)有时需要长达16个小时。但是,这种行为并不一致,有时DWH使用MariaDB 10在4小时内构建。
这些测试是在测试环境中运行的,Java应用程序只是简单地连接到数据库而不对其进行任何查询,因此它不会以任何方式对数据库负载做出贡献。
特别是DWH构建过程中的两个查询有时需要花费更长的时间才能运行(总共32K秒而不是4K秒)。
当DWH陷入困境时#34;在这些查询中,针对ibdata1
的I / O增加,MariaDB使用超过90%的I / O(基于iotop)。
我花了很多时间测试不同的MariaDB配置(更改分配给MariaDB的内存,线程数等等),但结果总是一样。
从MySQL 5.5升级到5.6时,是否有其他人遇到类似问题?还有什么想法可以确定问题可能是什么?
您还可以在此处找到磁盘I / O使用情况(Ops / s)的图表。在3月24日和3月26日,DWH正确运行(4小时),在3月25日,它花费了四倍(16小时)。
读取: Reads Ops/s
写入: Writes Ops/s
正如您所看到的那样,即使数据和进程相同,I / O配置文件也完全不同,就像MariaDB正在做一些额外的工作一样。
谢谢!
下面我收集的一些信息,其中一个查询运行缓慢。
显示完整的进程列表(查询57是DWH构建过程,其他所有内容都属于Java应用程序):
| Id | User | Host | db | Command | Time | State | Info | Progress |
| 50 | steve | localhost:37029 | qrtz | Sleep | 3 | | NULL | 0.000 |
| 51 | steve | localhost:37030 | qrtz | Sleep | 3 | | NULL | 0.000 |
| 52 | steve | localhost:37031 | qrtz | Sleep | 3 | | NULL | 0.000 |
| 57 | steve | localhost:37366 | dwh | Query | 14222 | Sending data | INSERT INTO dwh.assessment_answer_fact ( page_id, page_number, category_id, category_number, category_title, question_id, question_display_number, question_number, question_text, question_identifier, answer_value, assessment_fact_id ) SELECT p.id, p.page_number, ic.id, ic.category_number, ic.title, q.id, q.question_display_number, q.question_number, q.question_text, q.question_identifier, av.value, ca.id FROM smr.cans_assessment ca join survey.surveys s ON ( s.id = ca.survey_id ) join survey.instruments i ON ( s.instrument_id = i.id ) join survey.survey_categories sc ON ( sc.survey_id = s.id ) join survey.instrument_categories ic ON ( sc.instrument_category_id = ic.id ) join survey.pages p ON ( p.id = ic.page_id ) join survey.answers a ON ( a.survey_category_id = sc.id ) join survey.answer_values av ON ( av.answer_id = a.id ) join survey.questions q ON ( a.question_id = q.id ) WHERE ca.id in (select af.assessment_id from dwh.assessment_fact af) | 0.000 |
| 64 | steve | localhost:37700 | NULL | Sleep | 5394 | | NULL | 0.000 |
| 65 | steve | localhost:37701 | NULL | Sleep | 5394 | | NULL | 0.000 |
| 66 | steve | localhost:37702 | NULL | Sleep | 5394 | | NULL | 0.000 |
| 67 | steve | localhost:37703 | NULL | Sleep | 5394 | | NULL | 0.000 |
| 68 | steve | localhost:37770 | NULL | Query | 0 | init | show full processlist | 0.000 |
显示引擎innodb状态:
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 28198 srv_active, 0 srv_shutdown, 35270 srv_idle
srv_master_thread log flush and writes: 63465
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 247509
OS WAIT ARRAY INFO: signal count 230685
Mutex spin waits 233790, rounds 7015555, OS waits 233481
RW-shared spins 5789, rounds 173633, OS waits 5403
RW-excl spins 3842, rounds 258871, OS waits 7270
Spin rounds per wait: 30.01 mutex, 29.99 RW-shared, 67.38 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 2795813
Purge done for trx's n:o < 2795733 undo n:o < 0 state: running but idle
History list length 327
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 68, OS thread handle 0x7fa659961700, query id 135532 localhost 127.0.0.1 steve init
show engine innodb status
---TRANSACTION 2773754, not started
MySQL thread id 52, OS thread handle 0x7fa6923aa700, query id 135518 localhost 127.0.0.1 steve cleaning up
---TRANSACTION 2795812, not started
MySQL thread id 51, OS thread handle 0x7fa6923f3700, query id 135531 localhost 127.0.0.1 steve cleaning up
---TRANSACTION 2773251, not started
MySQL thread id 1, OS thread handle 0x7fa6924ce700, query id 0 Waiting for background binlog tasks
---TRANSACTION 2794864, ACTIVE 14118 sec fetching rows
mysql tables in use 11, locked 11
136069 lock struct(s), heap size 20477480, 23704803 row lock(s), undo log entries 3791652
MySQL thread id 57, OS thread handle 0x7fa69243c700, query id 129742 localhost 127.0.0.1 steve Sending data
INSERT INTO dwh.assessment_answer_fact ( page_id, page_number, category_id, category_number, category_title, question_id, question_display_number, question_number, question_text, question_identifier, answer_value, assessment_fact_id ) SELECT p.id, p.page_number, ic.id, ic.category_number, ic.title, q.id, q.question_display_number, q.question_number, q.question_text, q.question_identifier, av.value, ca.id FROM smr.cans_assessment ca join survey.surveys s ON ( s.id = ca.survey_id ) join survey.instruments i ON ( s.instrument_id = i.id ) join survey.survey_categories sc ON ( sc.survey_id = s.id
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table `survey`.`questions` trx id 2794864 lock mode IS lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 852 page no 9 n bits 184 index `PRIMARY` of table `survey`.`questions` trx table locks 12 total table locks 1 trx id 2794864 lock mode S lock hold time 14118 wait time before grant 0
TABLE LOCK table `survey`.`answers` trx id 2794864 lock mode IS lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 861 page no 12 n bits 824 index `FKCD7DB87560DC6F04` of table `survey`.`answers` trx table locks 12 total table locks 1 trx id 2794864 lock mode S lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 861 page no 6 n bits 336 index `PRIMARY` of table `survey`.`answers` trx table locks 12 total table locks 1 trx id 2794864 lock mode S locks rec but not gap lock hold time 14118 wait time before grant 0
TABLE LOCK table `survey`.`survey_categories` trx id 2794864 lock mode IS lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 855 page no 7 n bits 328 index `PRIMARY` of table `survey`.`survey_categories` trx table locks 12 total table locks 1 trx id 2794864 lock mode S locks rec but not gap lock hold time 14118 wait time before grant 0
TABLE LOCK table `smr`.`cans_assessment` trx id 2794864 lock mode IS lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 1800 page no 42 n bits 824 index `FK52F1DDEAF118504` of table `smr`.`cans_assessment` trx table locks 12 total table locks 1 trx id 2794864 lock mode S locks gap before rec lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 861 page no 32512 n bits 336 index `PRIMARY` of table `survey`.`answers` trx table locks 12 total table locks 1 trx id 2794864 lock mode S locks rec but not gap lock hold time 14118 wait time before grant 0
TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
29827256 OS file reads, 1144210 OS file writes, 345551 OS fsyncs
631.07 reads/s, 16384 avg bytes/read, 20.61 writes/s, 9.37 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 858, free list len 5697, seg size 6556, 299256 merges
merged operations:
insert 7839828, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
2106.29 hash searches/s, 2190.81 non-hash searches/s
---
LOG
---
Log sequence number 184023146842
Log flushed up to 184023103841
Pages flushed up to 184006513482
Last checkpoint at 184006296459
Max checkpoint age 216721613
Checkpoint age target 209949063
Modified age 16633360
Checkpoint age 16850383
0 pending log writes, 0 pending chkp writes
42065 log i/o's done, 1.22 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 549453824; in additional pool allocated 0
Total memory allocated by read views 512
Internal hash tables (constant factor + variable factor)
Adaptive hash index 10673872 (8851048 + 1822824)
Page hash 553976 (buffer pool 0 only)
Dictionary cache 4009988 (2214224 + 1795764)
File system 914416 (812272 + 102144)
Lock system 21808096 (1329176 + 20478920)
Recovery system 0 (0 + 0)
Dictionary memory allocated 1795764
Buffer pool size 32767
Buffer pool size, bytes 536854528
Free buffers 965
Database pages 30442
Old database pages 11252
Modified db pages 1638
Percent of dirty pages(LRU & free pages): 5.215
Max dirty pages percent: 75.000
Pending reads 1
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 687161, not young 141371905
4.33 youngs/s, 3024.61 non-youngs/s
Pages read 29827185, created 180475, written 972592
631.09 reads/s, 4.10 creates/s, 15.67 writes/s
Buffer pool hit rate 952 / 1000, young-making rate 0 / 1000 not 232 / 1000
Pages read ahead 485.66/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 30442, unzip_LRU len: 0
I/O sum[8334]:cur[77], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
1 RW transactions active inside InnoDB
0 RO transactions active inside InnoDB
1 out of 1000 descriptors used
Main thread process no. 7359, id 140352388314880, state: sleeping
Number of rows inserted 7172945, updated 106861, deleted 0, read 1363672517
240.68 inserts/s, 0.00 updates/s, 0.00 deletes/s, 1666.92 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
从 SELECT * FROM information_schema.metadata_lock_info (METADATA_LOCK_INFO插件)获取的元数据锁:
+-----------+-------------------------+-----------------+---------------------+--------------+------------------------+
| THREAD_ID | LOCK_MODE | LOCK_DURATION | LOCK_TYPE | TABLE_SCHEMA | TABLE_NAME |
+-----------+-------------------------+-----------------+---------------------+--------------+------------------------+
| 57 | MDL_INTENTION_EXCLUSIVE | MDL_STATEMENT | Global read lock | | |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | survey | instrument_categories |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | survey | questions |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | survey | pages |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | survey | answer_values |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | survey | surveys |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | dwh | assessment_fact |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | survey | answers |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | smr | cans_assessment |
| 57 | MDL_SHARED_WRITE | MDL_TRANSACTION | Table metadata lock | dwh | assessment_answer_fact |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | survey | survey_categories |
| 57 | MDL_SHARED_READ | MDL_TRANSACTION | Table metadata lock | survey | instruments |
+-----------+-------------------------+-----------------+---------------------+--------------+------------------------+
来自 pt-ioprofile -c sizes 的文件I / O(Percona Toolkit):
total pread pwrite fsync filename
72269824 72269824 0 0 /data/mysql/smr/survey/answer_values.ibd
8323072 393216 7929856 0 /data/mysql/smr/ibdata1
4964352 770048 4194304 0 /data/mysql/smr/dwh/assessment_answer_fact.ibd
4833280 4833280 0 0 /data/mysql/smr/survey/answers.ibd
2707456 0 2707456 0 /data/mysql/smr/ib_logfile1
1572864 1572864 0 0 /data/mysql/smr/dwh/assessment_fact.ibd
1523712 1523712 0 0 /data/mysql/smr/survey/surveys.ibd
1114112 1114112 0 0 /data/mysql/smr/survey/survey_categories.ibd
163840 163840 0 0 /data/mysql/smr/smr/cans_assessment.ibd
49152 49152 0 0 /data/mysql/smr/survey/questions.ibd
0 0 0 0 /data/mysql/smr/ib_logfile0
某些MariaDB配置参数(显示全局变量太长,无法容纳在消息中):
log_bin
binlog_format = mixed
max_binlog_size = 1073741824
binlog_cache_size = 4M
innodb_buffer_pool_size = 512M
innodb_io_capacity = 100
query_cache_size = 64M
innodb_flush_neighbors = 0
innodb_flush_log_at_trx_commit = 0
tmp_table_size = 32M
innodb_log_file_size = 128M
innodb_log_buffer_size = 8M
max_allowed_packet = 16M
innodb_file_per_table = 1
查询缓慢:
INSERT INTO dwh.assessment_answer_fact ( page_id, page_number, category_id, category_number, category_title, question_id, question_display_number, question_number, question_text, question_identifier, answer_value, assessment_fact_id ) SELECT p.id, p.page_number, ic.id, ic.category_number, ic.title, q.id, q.question_display_number, q.question_number, q.question_text, q.question_identifier, av.value, ca.id FROM smr.cans_assessment ca join survey.surveys s ON ( s.id = ca.survey_id ) join survey.instruments i ON ( s.instrument_id = i.id ) join survey.survey_categories sc ON ( sc.survey_id = s.id ) join survey.instrument_categories ic ON ( sc.instrument_category_id = ic.id ) join survey.pages p ON ( p.id = ic.page_id ) join survey.answers a ON ( a.survey_category_id = sc.id ) join survey.answer_values av ON ( av.answer_id = a.id ) join survey.questions q ON ( a.question_id = q.id ) WHERE ca.id in (select af.assessment_id from dwh.assessment_fact af)
解释慢查询:
+------+-------------+-------+--------+-----------------------------------------------+--------------------+---------+----------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+-----------------------------------------------+--------------------+---------+----------------------------------+------+-------------+
| 1 | PRIMARY | q | ALL | PRIMARY | NULL | NULL | NULL | 384 | |
| 1 | PRIMARY | a | ref | PRIMARY,FKCD7DB8757DA6A899,FKCD7DB87560DC6F04 | FKCD7DB87560DC6F04 | 8 | survey.q.id | 321 | |
| 1 | PRIMARY | sc | eq_ref | PRIMARY,FKBEB7B82168B10433,FKBEB7B821AF118504 | PRIMARY | 8 | survey.a.survey_category_id | 1 | |
| 1 | PRIMARY | ca | ref | PRIMARY,FK52F1DDEAF118504 | FK52F1DDEAF118504 | 8 | survey.sc.survey_id | 1 | Using index |
| 1 | PRIMARY | af | eq_ref | PRIMARY | PRIMARY | 8 | smr.ca.id | 1 | Using index |
| 1 | PRIMARY | s | eq_ref | PRIMARY,FK91914459785448E4 | PRIMARY | 8 | survey.sc.survey_id | 1 | |
| 1 | PRIMARY | i | eq_ref | PRIMARY | PRIMARY | 8 | survey.s.instrument_id | 1 | Using index |
| 1 | PRIMARY | ic | eq_ref | PRIMARY,FK1118917415F49764 | PRIMARY | 8 | survey.sc.instrument_category_id | 1 | |
| 1 | PRIMARY | av | ref | FK5075C8C38297FA84 | FK5075C8C38297FA84 | 8 | survey.a.id | 1 | |
| 1 | PRIMARY | p | eq_ref | PRIMARY | PRIMARY | 8 | survey.ic.page_id | 1 | |
+------+-------------+-------+--------+-----------------------------------------------+--------------------+---------+----------------------------------+------+-------------+
行数:
| answer_values | 10766276 |
| answers | 12020566 |
| answers_answer_values | 11662057 |
| cans_assessment | 77221 |
| instrument_categories | 85 |
| instruments | 11 |
| pages | 11 |
| questions | 384 |
| survey_categories | 1462377 |
| survey_categories_answers | 10954877 |
| surveys | 118702 |
| surveys_survey_categories | 1515111 |
| assessment_fact | 76803 |
| assessment_answer_fact | 7673695 |
答案 0 :(得分:0)
真正的问题是SELECT
:
SELECT p.id, p.page_number, ic.id, ic.category_number, ic.title,
q.id, q.question_display_number, q.question_number, q.question_text,
q.question_identifier, av.value, ca.id
FROM smr.cans_assessment ca
join survey.surveys s ON ( s.id = ca.survey_id )
join survey.instruments i ON ( s.instrument_id = i.id )
join survey.survey_categories sc ON ( sc.survey_id = s.id )
join survey.instrument_categories ic ON ( sc.instrument_category_id = ic.id )
join survey.pages p ON ( p.id = ic.page_id )
join survey.answers a ON ( a.survey_category_id = sc.id )
join survey.answer_values av ON ( av.answer_id = a.id )
join survey.questions q ON ( a.question_id = q.id )
WHERE ca.id in (
SELECT af.assessment_id
from dwh.assessment_fact af
...
请提供整个SELECT(它在输出中被截断)。
请检查JOIN是否正在通过索引。
请提供EXPLAIN SELECT ...
如果可能,将in ( SELECT ... )
变为JOIN。 (这可能是主要的性能问题!)
什么版本?不只是“10”和“5.6”。 (需要查看它是否具有in ( SELECT ... )
的优化。旧版本对此进行了非常优化。)
多少内存?除非你有一台小机器,否则0.5GB的buffer_pool是低的。
请解释INSERT..SELECT
的意图。
innodb_file_per_table的设置是什么? (这与ibdata1上可能存在大量I / O的原因有关。)
表格有多大,包括你要插入的表格?
修改1
假设这是整个尾随条款:
WHERE ca.id in (
SELECT af.assessment_id
from dwh.assessment_fact af )
您是否会看到删除WHERE子句并添加它会加快查询速度:
JOIN dwh.assessment_fact af ON af.assessment_id = ca.id
答案 1 :(得分:0)
这个'答案'侧重于
DWH构建首先删除整个dwh架构,然后重建它。
我建议那是倒退。如果执行以下操作,则表示没有停机时间,因此该慢查询没有明显的延迟:
CREATE TABLE new_1
LIKE real_1
; - 每个表正在重建一个INSERT..SELECT
表中执行令人讨厌的new
。RENAME TABLE real_1 TO old_1, new_1 TO real_1, real_2 TO old_2, ...;
- 这是唯一的“停机时间”,它是即时的。DROP TABLE old_1;
- 每张旧桌子。