Mysql字符集导致zabbix无法插入中文
在安装zabbix过程中需要用到后台数据库存储数据。但是在使用的过程中发现rpm安装的mysql数据库在不修改配置参数的时候,通过zabbix web界面插入包含中文的字符串会报错。这也是前面推荐最好全部使用英文的原因。
后来查阅发现是默认的mysql数据库的字符集问题;
1.查看当前数据库支持的字符集
当前数据库版本mysql 5.7.26,查看指出的所有的字符集
mysql> show character set ;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.00 sec)
在支持列表中我们可以看到眼熟的ascii,gbk,utf8,utf8mb4等,以及下面会出现的latin1。
着重强调:utf8和utf8mb4,在列表可以看到Maxlen最大长度一栏,utf8最大长度为3字节,所以utf8有个别名叫utf8mb3,两者是一个意思。而utf8mb4最大长度4字节,可以理解为utf8mb4是utf8的扩展,毕竟多一个字节可以多存很多数据,比如特殊的字符或者emoji表情就需要utf8mb4字符集。在实际生产环境中也都是使用utf8mb4字符集。
1.1查看当前数据库使用的字符集;
mysql> show variables like '%character%' ;
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
参数解释:
官方文档:
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html
character_set_client: 客户端请求数据的字符集
character_set_connection:从客户端接收到数据,然后传输的字符集
character_set_database: 默认数据库的字符集,无论默认数据库如何改变,都是这个字符集;如果没有默认数据库,那就使用 character_set_server指定的字符集,这个变量建议由系统自己管理,不要人为定义。
character_set_filesystem:把os上文件名转化成此字符集,即把
character_set_client转换character_set_filesystem, 默认binary是不做任何转换的
character_set_results: 结果集的字符集
character_set_server: 数据库服务器的默认字符集
character_set_system: 这个值总是utf8,不需要设置,是为存储系统元数据的字符集
通过以上可以看到数据库服务器默认的字符集是latin1,此数据集是ASCII的扩展,但是最大长度还是1字节,无法存储汉字,特殊符号等。
需要将默认的latin1修改为常用的utf8mb4字符集。
注意:mysql在5.5.3版本之后才支持utf8mb4.
1.2查看当前某个数据库使用的字符集
mysql> show create database test ;
+----------+------------------------------------------------------------------+
| Database | Create Database |
+----------+------------------------------------------------------------------+
| test | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET utf8mb4 */ |
+----------+------------------------------------------------------------------+
1 row in set (0.00 sec)
2.全局修改字符集(/etc/my.cnf的方式)
修改mysql配置文件可以全局修改数据库的字符集;
vim /etc/my.cnf
[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_general_ci
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
修改完之后再看发现默认字符集已经修改
mysql> show variables like '%character%' ;
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
3.指定库的字符集
可以在建库的时候操作,指定当前库的字符集。
mysql> create database test2 default character set utf8mb4 collate utf8mb4_general_ci ;
Query OK, 1 row affected (0.01 sec)
mysql> show create database test2 ;
+----------+-------------------------------------------------------------------+
| Database | Create Database |
+----------+-------------------------------------------------------------------+
| test2 | CREATE DATABASE `test2` /*!40100 DEFAULT CHARACTER SET utf8mb4 */ |
+----------+-------------------------------------------------------------------+
1 row in set (0.00 sec
4.字符集转换
在数据库存在数据的情况下,谨慎操作,仅供参考。
特别注意数据库版本,最好5.6或者5.7。
导出数据
mysqldump -uroot -p 库名 > charset.sql
替换数据,查看sql文件中的内容,寻找默认的字符集。
sed -i s/CHARSET=latin1/CHARSET=utf8/g charset.sql
修改后的数据导入到新库中。
set utf8mb4 collate utf8mb4_general_ci ; create database new_charset default character
use new_charset;
source charset.sql;
这样里面的字符集就修改好了。