vlambda博客
学习文章列表

Mysql8官方分布式数据库MGR最佳实践,竟遇大坑

MGR简介

MGR是mysql Group Replication简称,中文名称是Mysql组复制,MGR是MySQL官方于2016年12月推出的一个全新的高可用与高扩展的解决方案,提供了高可用、高扩展、高可靠的MySQL集群服务,目前只支持MYSQL5.7和mysql8.0版本。

MGR优点

  1. 高一致性:基于原生复制和paxos协议的组复制技术。

  2. 高容错性:有自动检测机制,当出现宕机后,会自动剔除问题节点,在2N+1个节点集群中,集群 只要N +1个节点还存活着,数据库就能稳定的对外提供服务。

  3. 高扩展性:可以在线增加和移除节点。

  4. 高灵活性:可以在单主模式和多主模式自由切换。


MGR缺点

MGR技术比较新,稳定性方面还需要业界验证。


Mysql8分布式数据库MGR最佳实践

环境准备

3台虚拟机搭建3节点的MGR集群,Mysql数据库用的8.0.21版本。

IP 客户端访问端口 MGR成员通信端口 server_id OS版本
192.168.112.131 3309 10061 1 Centos 7.6
192.168.112.132 3309 10061 2 Centos 7.6
192.168.112.135 3309 10061 3 Centos 7.6

创建目录

mkdir -p /data/mysql/mysql8/{data,log,conf,run}/3309mkdir -p /data/mysql/mysql8/log/3309/{redo,undo,relay}chown -R mysql:mysql /data/mysql/mysql8/mysql8chmod 750 /data/mysql/mysql8/{data,log,conf,run}

配置参数

[mysqld]##basic settings###server-id=1port = 3309character-set-server = utf8mb4collation-server = utf8mb4_unicode_ciskip-name-resolvemax_connections=1000max_user_connections=800max_allowed_packet=512Mmax_connect_errors=100000datadir = /data/mysql/mysql8/data/3309socket = /data/mysql/mysql8/run/3309/mysql.sockpid_file = /data/mysql/mysql8/run/3309/mysql.pidtransaction_isolation = READ-COMMITTEDlower_case_table_names=1default_time_zone =+8:00open_files_limit=65535log_timestamps=systemwait_timeout=900interactive_timeout=900
##innodb setting##innodb_buffer_pool_size = 256Minnodb_buffer_pool_instances = 1innodb_io_capacity=2000innodb_flush_method=O_DIRECT
innodb_log_group_home_dir = /data/mysql/mysql8/log/3309/redoinnodb_log_file_size = 128Minnodb_log_files_in_group=4innodb_log_buffer_size = 32M
innodb_undo_directory = /data/mysql/mysql8/log/3309/undoinnodb_undo_tablespaces = 4innodb_undo_log_truncate=1innodb_max_undo_log_size=1G
innodb_flush_neighbors=0innodb_flush_log_at_trx_commit = 1innodb_print_all_deadlocks = 1innodb_online_alter_log_max_size=128M
innodb_lock_wait_timeout=10innodb_file_per_table=ONinnodb_doublewrite=ON
##log settings##log-error = /data/mysql/mysql8/log/3309/error.loglog-bin = /data/mysql/mysql8/log/3309/mysql_bin.logslow_query_log = 1slow_query_log_file = /data/mysql/mysql8/log/3309/mysql_slow_query.loglong_query_time = 10
##replication settings##gtid-mode=onenforce-gtid-consistency=truemaster-info-repository=TABLErelay-log-info-repository=TABLEsync-master-info=1000relay_log_recovery = 1relay-log=/data/mysql/mysql8/log/3309/relay/mysql-relay-bin
#binloglog_bin=/data/mysql/mysql8/log/3309/binlogexpire_logs_days=10max_binlog_cache_size=1024Msync_binlog=1
##MGR settingsbinlog_checksum = NONElog_slave_updates = ONbinlog_format=row#transaction_write_set_extraction ='XXHASH64'#loose-group_replication_group_name = '38f34157-cbe8-4623-a7bd-054cc5c2de0b'#loose-group_replication_start_on_boot = off#loose-group_replication_local_address = '192.168.112.131:10061'#loose-group_replication_group_seeds ='192.168.112.131:10061,192.168.112.132:10061,192.168.112.135:10061'#loose-group_replication_bootstrap_group = off#loose-group_replication_ip_whitelist = '192.168.112.131/24,192.168.112.132/24,192.168.112.135/24'
[client]port = 3309socket = /data/mysql/mysql8/run/3309/mysql.sock

在这里需要注意的是地方,由于MGR的插件,mysql默认是没有安装的,所以在这里关于MGR的配置参数,都注释掉了,等数据库实例启动之后,再开启。

每个节点的server-id需要设置成不一样。


初始化数据库

/data/mysql-8.0.21/bin/mysqld --defaults-file=/data/mysql/mysql8/conf/3309/my.cnf --initialize --basedir=/data/mysql-8.0.21/ --datadir=/data/mysql/mysql8/data/3309 --user=mysql --initialize-insecure --ssl --explicit_defaults_for_timestamp
如果要启用SSL安全连接,执行如下命令/data/mysql-8.0.21/bin/mysql_ssl_rsa_setup --basedir=/data/mysql-8.0.21 --datadir=/data/mysql/mysql8/data/3309

启停mysql实例

 
/data/mysql-8.0.21/bin/mysqld_safe --defaults-file=/data/mysql/mysql8/conf/3309/my.cnf &
/data/mysql-8.0.21/bin/mysqladmin -uroot --socket=/data/mysql/mysql8/run/3309/mysql.sock shutdown &


登录实例

 
/data/mysql-8.0.21/bin/mysql -uroot --socket=/data/mysql/mysql8/run/3309/mysql.sock

MGR插件安装

[root@localhost] 14:08:44 [(none)]>install plugin group_replication soname 'group_replication.so';Query OK, 0 rows affected (0.20 sec)
[root@localhost] 14:09:06 [(none)]>show plugins;+---------------------------------+----------+--------------------+----------------------+---------+| Name | Status | Type | Library | License |+---------------------------------+----------+--------------------+----------------------+---------+| binlog | ACTIVE | STORAGE ENGINE | NULL | GPL || mysql_native_password | ACTIVE | AUTHENTICATION | NULL | GPL || sha256_password | ACTIVE | AUTHENTICATION | NULL | GPL || caching_sha2_password | ACTIVE | AUTHENTICATION | NULL | GPL || sha2_cache_cleaner | ACTIVE | AUDIT | NULL | GPL || CSV | ACTIVE | STORAGE ENGINE | NULL | GPL || MEMORY | ACTIVE | STORAGE ENGINE | NULL | GPL || InnoDB | ACTIVE | STORAGE ENGINE | NULL | GPL || INNODB_TRX | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_CMP | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_CMP_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_CMPMEM | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_CMPMEM_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_CMP_PER_INDEX | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_CMP_PER_INDEX_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_BUFFER_PAGE | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_BUFFER_PAGE_LRU | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_BUFFER_POOL_STATS | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_TEMP_TABLE_INFO | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_METRICS | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_FT_DEFAULT_STOPWORD | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_FT_DELETED | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_FT_BEING_DELETED | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_FT_CONFIG | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_FT_INDEX_CACHE | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_FT_INDEX_TABLE | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_TABLES | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_TABLESTATS | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_INDEXES | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_TABLESPACES | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_COLUMNS | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_VIRTUAL | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_CACHED_INDEXES | ACTIVE | INFORMATION SCHEMA | NULL | GPL || INNODB_SESSION_TEMP_TABLESPACES | ACTIVE | INFORMATION SCHEMA | NULL | GPL || MyISAM | ACTIVE | STORAGE ENGINE | NULL | GPL || MRG_MYISAM | ACTIVE | STORAGE ENGINE | NULL | GPL || PERFORMANCE_SCHEMA | ACTIVE | STORAGE ENGINE | NULL | GPL || TempTable | ACTIVE | STORAGE ENGINE | NULL | GPL || ARCHIVE | ACTIVE | STORAGE ENGINE | NULL | GPL || BLACKHOLE | ACTIVE | STORAGE ENGINE | NULL | GPL || FEDERATED | DISABLED | STORAGE ENGINE | NULL | GPL || ngram | ACTIVE | FTPARSER | NULL | GPL || mysqlx_cache_cleaner | ACTIVE | AUDIT | NULL | GPL || mysqlx | ACTIVE | DAEMON | NULL | GPL || group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | GPL |+---------------------------------+----------+--------------------+----------------------+---------+45 rows in set (0.00 sec)


安装好插件之后,将MGR的参数注释去掉,重启mysql实例。

设置MGR单主模式

在所有节点上执行以下命令,创建复制用户

set sql_log_bin=0;create user 'repl'@'%' identified with mysql_native_password by 'repl';grant replication slave,replication client on *.* to 'repl'@'%';flush privileges;set sql_log_bin=1;

在192.168.112.131服务上设置主节点

set global group_replication_single_primary_mode=on;set global group_replication_bootstrap_group=ON;CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';start group_replication;set global group_replication_bootstrap_group=OFF;

设置好之后,可以查看MGR复制成员状态

[root@localhost] 14:44:39 [(none)]>select * from performance_schema.replication_group_members;+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+| group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | mysql | 3309 | ONLINE | PRIMARY | 8.0.21 |+---------------------------+--------------------------------------+----
---------+-------------+--------------+-------------+----------------+

设置好主之后,就开始设置剩下2个从节点了,执行命令都是一样的。

CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';start group_replication;

到这里,遇到了大坑,来看看

[root@localhost] 14:44:40 [(none)]>select * from performance_schema.replication_group_members;+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+| group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | mysql | 3309 | RECOVERING | SECONDARY | 8.0.21 || group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | mysql | 3309 | RECOVERING | SECONDARY | 8.0.21 || group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | mysql | 3309 | ONLINE | PRIMARY | 8.0.21 |+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+3 rows in set (0.00 sec)

2个从节点的状态一直是RECOVERING状态,正常时ONLINE状态的。这个状态持续了有10几分钟,看mysql的错误日志看到以下错误信息

 
2020-09-15T14:24:57.555493+08:00 19 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'repl@mysql:3309' - retry-time: 60 retries: 1 message: Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection. Error_code: MY-0020612020-09-15T14:24:57.557460+08:00 18 [ERROR] [MY-011582] [Repl] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'2020-09-15T14:24:57.557541+08:00 18 [ERROR] [MY-011583] [Repl] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'2020-09-15T14:25:57.284156+08:00 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'


重点内容在error connecting to master 'repl@mysql:3309',无法连接到master,即网络不通,在这里MGR竟然用的主机名来和集群成员进行通信,关键是我的主机名都是一样的,因为虚拟级时克隆出来的。


在这里有2个解决方案
方案一
修改主机名称,并且在MGR每个集群节点上的/etc/hosts中,添加主机和IP的对应关系

report_host=192.168.112.135report_port=3309

我用的是方案二,在my.cnf参数文件中分别添加以下内容

在192.168.112.131服务器上report_host=192.168.112.131report_port=3309
在192.168.112.132服务器上report_host=192.168.112.132report_port=3309
在192.168.112.135服务器上report_host=192.168.112.135report_port=3309

添加完成之后,就重启Mysql实例,重新启动集群。

设置主节点

set global group_replication_bootstrap_group=ON;start group_replication;set global group_replication_bootstrap_group=OFF;

设置从节点

 
start group_replication;

查看集群状态

[root@localhost] 15:02:23 [(none)]>select * from performance_schema.replication_group_members;+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+| group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | 192.168.112.135 | 3309 | ONLINE | SECONDARY | 8.0.21 || group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | 192.168.112.132 | 3309 | ONLINE | SECONDARY | 8.0.21 || group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | 192.168.112.131 | 3309 | ONLINE | PRIMARY | 8.0.21 |+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+3 rows in set (0.00 sec)



这次集群节点状态都是online了。

单主切换到多主模式

在所有节点执行

stop group_replication;set global group_replication_single_primary_mode=OFF;set global group_replication_enforce_update_everywhere_checks=ON;

选择任意节点执行

SET GLOBAL group_replication_bootstrap_group=ON;START GROUP_REPLICATION;SET GLOBAL group_replication_bootstrap_group=OFF;

剩余节点执行

START GROUP_REPLICATION;

查看集群状态

[root@localhost] 17:45:57 [db1]>select * from performance_schema.replication_group_members;+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+| group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | 192.168.112.135 | 3309 | ONLINE | PRIMARY | 8.0.21 || group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | 192.168.112.132 | 3309 | ONLINE | PRIMARY | 8.0.21 || group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | 192.168.112.131 | 3309 | ONLINE | PRIMARY | 8.0.21 |+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+3 rows in set (0.01 sec)

多主切换到多单模式

在所有节点执行

stop group_replication;set global group_replication_enforce_update_everywhere_checks=off;

选择主节点数据库执行

set global group_replication_single_primary_mode=on;set global group_replication_bootstrap_group=ON;start group_replication;set global group_replication_bootstrap_group=OFF;

剩余节点执行

set global group_replication_single_primary_mode=on;START GROUP_REPLICATION;

查看集群状态

[root@localhost] 17:51:12 [db1]>select * from performance_schema.replication_group_members;+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+| group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | 192.168.112.135 | 3309 | ONLINE | SECONDARY | 8.0.21 || group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | 192.168.112.132 | 3309 | ONLINE | SECONDARY | 8.0.21 || group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | 192.168.112.131 | 3309 | ONLINE | PRIMARY | 8.0.21 |+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+3 rows in set (0.00 sec)