Mysql8官方分布式数据库MGR最佳实践,竟遇大坑
MGR简介
MGR是mysql Group Replication简称,中文名称是Mysql组复制,MGR是MySQL官方于2016年12月推出的一个全新的高可用与高扩展的解决方案,提供了高可用、高扩展、高可靠的MySQL集群服务,目前只支持MYSQL5.7和mysql8.0版本。
MGR优点
高一致性:基于原生复制和paxos协议的组复制技术。
高容错性:有自动检测机制,当出现宕机后,会自动剔除问题节点,在2N+1个节点集群中,集群 只要N +1个节点还存活着,数据库就能稳定的对外提供服务。
高扩展性:可以在线增加和移除节点。
高灵活性:可以在单主模式和多主模式自由切换。
MGR缺点
MGR技术比较新,稳定性方面还需要业界验证。
Mysql8分布式数据库MGR最佳实践
环境准备
3台虚拟机搭建3节点的MGR集群,Mysql数据库用的8.0.21版本。
IP | 客户端访问端口 | MGR成员通信端口 | server_id | OS版本 |
---|---|---|---|---|
192.168.112.131 | 3309 | 10061 | 1 | Centos 7.6 |
192.168.112.132 | 3309 | 10061 | 2 | Centos 7.6 |
192.168.112.135 | 3309 | 10061 | 3 | Centos 7.6 |
创建目录
mkdir -p /data/mysql/mysql8/{data,log,conf,run}/3309
mkdir -p /data/mysql/mysql8/log/3309/{redo,undo,relay}
chown -R mysql:mysql /data/mysql/mysql8/mysql8
chmod 750 /data/mysql/mysql8/{data,log,conf,run}
配置参数
[mysqld]
##basic settings###
server-id=1
port = 3309
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
skip-name-resolve
max_connections=1000
max_user_connections=800
max_allowed_packet=512M
max_connect_errors=100000
datadir = /data/mysql/mysql8/data/3309
socket = /data/mysql/mysql8/run/3309/mysql.sock
pid_file = /data/mysql/mysql8/run/3309/mysql.pid
transaction_isolation = READ-COMMITTED
lower_case_table_names=1
default_time_zone =+8:00
open_files_limit=65535
log_timestamps=system
wait_timeout=900
interactive_timeout=900
##innodb setting##
innodb_buffer_pool_size = 256M
innodb_buffer_pool_instances = 1
innodb_io_capacity=2000
innodb_flush_method=O_DIRECT
innodb_log_group_home_dir = /data/mysql/mysql8/log/3309/redo
innodb_log_file_size = 128M
innodb_log_files_in_group=4
innodb_log_buffer_size = 32M
innodb_undo_directory = /data/mysql/mysql8/log/3309/undo
innodb_undo_tablespaces = 4
innodb_undo_log_truncate=1
innodb_max_undo_log_size=1G
innodb_flush_neighbors=0
innodb_flush_log_at_trx_commit = 1
innodb_print_all_deadlocks = 1
innodb_online_alter_log_max_size=128M
innodb_lock_wait_timeout=10
innodb_file_per_table=ON
innodb_doublewrite=ON
##log settings##
log-error = /data/mysql/mysql8/log/3309/error.log
log-bin = /data/mysql/mysql8/log/3309/mysql_bin.log
slow_query_log = 1
slow_query_log_file = /data/mysql/mysql8/log/3309/mysql_slow_query.log
long_query_time = 10
##replication settings##
gtid-mode=on
enforce-gtid-consistency=true
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1000
relay_log_recovery = 1
relay-log=/data/mysql/mysql8/log/3309/relay/mysql-relay-bin
#binlog
log_bin=/data/mysql/mysql8/log/3309/binlog
expire_logs_days=10
max_binlog_cache_size=1024M
sync_binlog=1
##MGR settings
binlog_checksum = NONE
log_slave_updates = ON
binlog_format=row
#transaction_write_set_extraction ='XXHASH64'
#loose-group_replication_group_name = '38f34157-cbe8-4623-a7bd-054cc5c2de0b'
#loose-group_replication_start_on_boot = off
#loose-group_replication_local_address = '192.168.112.131:10061'
#loose-group_replication_group_seeds ='192.168.112.131:10061,192.168.112.132:10061,192.168.112.135:10061'
#loose-group_replication_bootstrap_group = off
#loose-group_replication_ip_whitelist = '192.168.112.131/24,192.168.112.132/24,192.168.112.135/24'
[client]
port = 3309
socket = /data/mysql/mysql8/run/3309/mysql.sock
在这里需要注意的是地方,由于MGR的插件,mysql默认是没有安装的,所以在这里关于MGR的配置参数,都注释掉了,等数据库实例启动之后,再开启。
每个节点的server-id需要设置成不一样。
初始化数据库
/data/mysql-8.0.21/bin/mysqld --defaults-file=/data/mysql/mysql8/conf/3309/my.cnf --initialize --basedir=/data/mysql-8.0.21/ --datadir=/data/mysql/mysql8/data/3309 --user=mysql --initialize-insecure --ssl --explicit_defaults_for_timestamp
如果要启用SSL安全连接,执行如下命令
/data/mysql-8.0.21/bin/mysql_ssl_rsa_setup --basedir=/data/mysql-8.0.21 --datadir=/data/mysql/mysql8/data/3309
启停mysql实例
/data/mysql-8.0.21/bin/mysqld_safe --defaults-file=/data/mysql/mysql8/conf/3309/my.cnf &
/data/mysql-8.0.21/bin/mysqladmin -uroot --socket=/data/mysql/mysql8/run/3309/mysql.sock shutdown &
登录实例
/data/mysql-8.0.21/bin/mysql -uroot --socket=/data/mysql/mysql8/run/3309/mysql.sock
MGR插件安装
[root@localhost] 14:08:44 [(none)]>install plugin group_replication soname 'group_replication.so';
Query OK, 0 rows affected (0.20 sec)
[root@localhost] 14:09:06 [(none)]>show plugins;
+---------------------------------+----------+--------------------+----------------------+---------+
| Name | Status | Type | Library | License |
+---------------------------------+----------+--------------------+----------------------+---------+
| binlog | ACTIVE | STORAGE ENGINE | NULL | GPL |
| mysql_native_password | ACTIVE | AUTHENTICATION | NULL | GPL |
| sha256_password | ACTIVE | AUTHENTICATION | NULL | GPL |
| caching_sha2_password | ACTIVE | AUTHENTICATION | NULL | GPL |
| sha2_cache_cleaner | ACTIVE | AUDIT | NULL | GPL |
| CSV | ACTIVE | STORAGE ENGINE | NULL | GPL |
| MEMORY | ACTIVE | STORAGE ENGINE | NULL | GPL |
| InnoDB | ACTIVE | STORAGE ENGINE | NULL | GPL |
| INNODB_TRX | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMP | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMP_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMPMEM | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMPMEM_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMP_PER_INDEX | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMP_PER_INDEX_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_BUFFER_PAGE | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_BUFFER_PAGE_LRU | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_BUFFER_POOL_STATS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_TEMP_TABLE_INFO | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_METRICS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_DEFAULT_STOPWORD | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_DELETED | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_BEING_DELETED | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_CONFIG | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_INDEX_CACHE | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_INDEX_TABLE | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_TABLES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_TABLESTATS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_INDEXES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_TABLESPACES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_COLUMNS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_VIRTUAL | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CACHED_INDEXES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_SESSION_TEMP_TABLESPACES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| MyISAM | ACTIVE | STORAGE ENGINE | NULL | GPL |
| MRG_MYISAM | ACTIVE | STORAGE ENGINE | NULL | GPL |
| PERFORMANCE_SCHEMA | ACTIVE | STORAGE ENGINE | NULL | GPL |
| TempTable | ACTIVE | STORAGE ENGINE | NULL | GPL |
| ARCHIVE | ACTIVE | STORAGE ENGINE | NULL | GPL |
| BLACKHOLE | ACTIVE | STORAGE ENGINE | NULL | GPL |
| FEDERATED | DISABLED | STORAGE ENGINE | NULL | GPL |
| ngram | ACTIVE | FTPARSER | NULL | GPL |
| mysqlx_cache_cleaner | ACTIVE | AUDIT | NULL | GPL |
| mysqlx | ACTIVE | DAEMON | NULL | GPL |
| group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | GPL |
+---------------------------------+----------+--------------------+----------------------+---------+
45 rows in set (0.00 sec)
安装好插件之后,将MGR的参数注释去掉,重启mysql实例。
设置MGR单主模式
在所有节点上执行以下命令,创建复制用户
set sql_log_bin=0;
create user 'repl'@'%' identified with mysql_native_password by 'repl';
grant replication slave,replication client on *.* to 'repl'@'%';
flush privileges;
set sql_log_bin=1;
在192.168.112.131服务上设置主节点
set global group_replication_single_primary_mode=on;
set global group_replication_bootstrap_group=ON;
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';
start group_replication;
set global group_replication_bootstrap_group=OFF;
设置好之后,可以查看MGR复制成员状态
[root@localhost] 14:44:39 [(none)]>select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | mysql | 3309 | ONLINE | PRIMARY | 8.0.21 |
+---------------------------+--------------------------------------+----
---------+-------------+--------------+-------------+----------------+
设置好主之后,就开始设置剩下2个从节点了,执行命令都是一样的。
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';
start group_replication;
到这里,遇到了大坑,来看看
[root@localhost] 14:44:40 [(none)]>select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | mysql | 3309 | RECOVERING | SECONDARY | 8.0.21 |
| group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | mysql | 3309 | RECOVERING | SECONDARY | 8.0.21 |
| group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | mysql | 3309 | ONLINE | PRIMARY | 8.0.21 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)
2个从节点的状态一直是RECOVERING状态,正常时ONLINE状态的。这个状态持续了有10几分钟,看mysql的错误日志看到以下错误信息
2020-09-15T14:24:57.555493+08:00 19 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'repl@mysql:3309' - retry-time: 60 retries: 1 message: Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection. Error_code: MY-002061
2020-09-15T14:24:57.557460+08:00 18 [ERROR] [MY-011582] [Repl] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'
2020-09-15T14:24:57.557541+08:00 18 [ERROR] [MY-011583] [Repl] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'
2020-09-15T14:25:57.284156+08:00 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
重点内容在error connecting to master 'repl@mysql:3309',无法连接到master,即网络不通,在这里MGR竟然用的主机名来和集群成员进行通信,关键是我的主机名都是一样的,因为虚拟级时克隆出来的。
在这里有2个解决方案
方案一
修改主机名称,并且在MGR每个集群节点上的/etc/hosts中,添加主机和IP的对应关系
report_host=192.168.112.135
report_port=3309
我用的是方案二,在my.cnf参数文件中分别添加以下内容
在192.168.112.131服务器上
report_host=192.168.112.131
report_port=3309
在192.168.112.132服务器上
report_host=192.168.112.132
report_port=3309
在192.168.112.135服务器上
report_host=192.168.112.135
report_port=3309
添加完成之后,就重启Mysql实例,重新启动集群。
设置主节点
set global group_replication_bootstrap_group=ON;
start group_replication;
set global group_replication_bootstrap_group=OFF;
设置从节点
start group_replication;
查看集群状态
[root@localhost] 15:02:23 [(none)]>select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | 192.168.112.135 | 3309 | ONLINE | SECONDARY | 8.0.21 |
| group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | 192.168.112.132 | 3309 | ONLINE | SECONDARY | 8.0.21 |
| group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | 192.168.112.131 | 3309 | ONLINE | PRIMARY | 8.0.21 |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)
这次集群节点状态都是online了。
单主切换到多主模式
在所有节点执行
stop group_replication;
set global group_replication_single_primary_mode=OFF;
set global group_replication_enforce_update_everywhere_checks=ON;
选择任意节点执行
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
剩余节点执行
START GROUP_REPLICATION;
查看集群状态
[root@localhost] 17:45:57 [db1]>select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | 192.168.112.135 | 3309 | ONLINE | PRIMARY | 8.0.21 |
| group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | 192.168.112.132 | 3309 | ONLINE | PRIMARY | 8.0.21 |
| group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | 192.168.112.131 | 3309 | ONLINE | PRIMARY | 8.0.21 |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
3 rows in set (0.01 sec)
多主切换到多单模式
在所有节点执行
stop group_replication;
set global group_replication_enforce_update_everywhere_checks=off;
选择主节点数据库执行
set global group_replication_single_primary_mode=on;
set global group_replication_bootstrap_group=ON;
start group_replication;
set global group_replication_bootstrap_group=OFF;
剩余节点执行
set global group_replication_single_primary_mode=on;
START GROUP_REPLICATION;
查看集群状态
[root@localhost] 17:51:12 [db1]>select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | 192.168.112.135 | 3309 | ONLINE | SECONDARY | 8.0.21 |
| group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | 192.168.112.132 | 3309 | ONLINE | SECONDARY | 8.0.21 |
| group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | 192.168.112.131 | 3309 | ONLINE | PRIMARY | 8.0.21 |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)