我有一个简单的master->与MariaDB的奴隶设置:
Master:Ubuntu 16.04 LTS与MariaDB 10.2.8和percona-toolkit 3.0.4
奴隶:Ubuntu 16.04 LTS与MariaDB 10.2.7
复制正常运行,现在我想检查主服务器和从服务器之间的数据是否相同。
我在master上安装了percona-toolkit并创建了一个校验和用户:
MariaDB> GRANT REPLICATION SLAVE,PROCESS,SUPER, SELECT ON *.* TO `pt_checksum`@'%' IDENTIFIED BY 'password';
MariaDB> GRANT ALL PRIVILEGES ON percona.* TO `pt_checksum`@'%';
MariaDB> FLUSH PRIVILEGES;
我还在slave conf中添加了report_host,以便它呈现给master:
MariaDB [(none)]> show slave hosts;
+-----------+-----------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+-----------+------+-----------+
| 2 | 10.0.0.49 | 3306 | 1 |
+-----------+-----------+------+-----------+
1 row in set (0.00 sec)
为了测试pt-table-checksum我从slave上的测试数据库中的Tickets
表中删除了一行。我已经确认这行确实丢失但仍然出现在主人身上。
但是pt-table-checksum没有报告这种差异:
# pt-table-checksum --databases=shop_test --tables=Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-07T16:15:02 0 0 14 1 0 0.013 shop_test.Tickets
所以我在我的环境中设置了PTDEBUG = 1,但似乎主设备与从设备连接良好。我试图从输出中选出相关的位:
# MasterSlave:5175 9725 Connected to h=localhost,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 1
# MasterSlave:5219 9725 Looking for slaves on h=localhost,p=...,u=pt_checksum using methods processlist hosts
# MasterSlave:5226 9725 Finding slaves with _find_slaves_by_processlist
# MasterSlave:5288 9725 DBI::db=HASH(0x31c5190) SHOW GRANTS FOR CURRENT_USER()
# MasterSlave:5318 9725 DBI::db=HASH(0x31c5190) SHOW FULL PROCESSLIST
# DSNParser:1417 9725 Parsing h=10.0.0.49
[...]
# MasterSlave:5231 9725 Found 1 slaves
# MasterSlave:5208 9725 Recursing from h=localhost,p=...,u=pt_checksum to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5155 9725 Recursion methods: processlist hosts
[...]
# MasterSlave:5175 9725 Connected to h=10.0.0.49,p=...,u=pt_checksum
# MasterSlave:5184 9725 SELECT @@SERVER_ID
# MasterSlave:5186 9725 Working on server ID 2
# MasterSlave:5097 9725 Found slave: h=10.0.0.49,p=...,u=pt_checksum
[...]
# pt_table_checksum:9793 9725 Exit status 0 oktorun 1
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31cd218) Disconnecting dbh on slaveserver h=10.0.0.49
# Cxn:3764 9725 Destroying cxn
# Cxn:3774 9725 DBI::db=HASH(0x31c5190) Disconnecting dbh on masterserver h=localhost
我没有想法,为什么没有检测到丢失的行?
答案 0 :(得分:1)
我在周末注意到了一个新的错误报告,我今天已经确认这确实是我遇到的问题。
解决方法是添加--set-vars binlog_format=statement
。
当我设置此选项时,差异会在第二次运行后显示出来。
在第一次运行期间,从站上的校验和表更改为:
MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | f30abebe | 14 | f30abebe | 14 |
+---------+----------+----------+------------+------------+
...到...
MariaDB [percona]> select tbl, this_crc, this_cnt, master_crc,master_cnt from checksums where tbl = 'Tickets' and db = 'shop_test';
+---------+----------+----------+------------+------------+
| tbl | this_crc | this_cnt | master_crc | master_cnt |
+---------+----------+----------+------------+------------+
| Tickets | 284ec207 | 13 | f30abebe | 14 |
+---------+----------+----------+------------+------------+
在第二次运行之后,差异也存在于pt-checksum-table输出中:
# pt-table-checksum --tables=shop_test.Tickets --host=localhost --user=pt_checksum --password=... --no-check-binlog-format --no-check-replication-filters --set-vars binlog_format=statement
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-11T11:17:37 0 1 14 1 0 0.022 shop_test.Tickets
我使用SHOW VARIABLES LIKE 'binlog_format'
检查了binlog_format仍然是“MIXED”,所以显然它只会在会话期间发生变化。根据文档,据我所知,这应该是自动发生的:
这仅适用于基于语句的复制(pt-table-checksum 将持续时间的binlog格式切换为STATEMENT 如果您的服务器使用基于行的复制,则会话。)
错误报告: {{3}}