vlambda博客
学习文章列表

MySQL的text和varchar区别你真的知道么?

序言

日常开发过程中,我们在存放已知最大长度的字符串的时候基本都是用的varchar,一些不确定长度的字符串都使用text来承载。那是否想过为啥不能定义一个超长varchar属性来存放不确定长度的字符串呢?text和varchar区别到底是啥呢?

那么本文就来分析一下。

VARCHAR

首先我们看一下MySQL官网对varchar的描述:

A variable-length string. Mrepresents the maximum column length in characters. The range of Mis 0 to 65,535. The effective maximum length of a VARCHARis subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. For example, utf8characters can require up to three bytes per character, so a VARCHARcolumn that uses the utf8character set can be declared to be a maximum of 21,844 characters. See Section 8.4.7, “Limits on Table Column Count and Row Size”.

MySQL stores VARCHAR values as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A VARCHAR column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

VARCHAR is shorthand for CHARACTER VARYING. NATIONAL VARCHAR is the standard SQL way to define that a VARCHAR column should use some predefined character set. MySQL uses utf8 as this predefined character set. Section 10.3.7, “The National Character Set”. NVARCHAR is shorthand for NATIONAL VARCHAR.

来自:https://dev.mysql.com/doc/refman/5.7/en/string-type-syntax.html


大致意思是:varchar(n)中的n表示字符数,最大空间是65535个字节, 存放字符数量跟字符集有关系;varchar实际范围是65532或65533, 因为内容头部会占用1或2个字节保存该字符串的长度;如果字段default null(即默认值为空),整条记录还需要1个字节保存默认值null。如果是utf8编码, 那么varchar最多存65532/3 = 21844个字符。

TEXT

我们再看一下MySQL官网对text的描述:

A TEXTcolumn with a maximum length of 65,535 (216− 1) characters. The effective maximum length is less if the value contains multibyte characters. Each TEXTvalue is stored using a 2-byte length prefix that indicates the number of bytes in the value.

An optional length M can be given for this type. If this is done, MySQL creates the column as the smallest TEXT type large enough to hold values M characters long.

来自:https://dev.mysql.com/doc/refman/5.7/en/string-type-syntax.html


大致意思是:跟varchar基本相同, 理论上最多保存65535个字符, 实际上text占用内存空间最大也是65535个字节;考虑到字符编码方式, 一个字符占用多个字节, text并不能存放那么多字符;跟varchar的区别是text需要2个字节空间记录字段的总字节数。


但是,官方文档中还提到text和varchar的实现不同:

TEXT and BLOB columns are implemented differently in the NDB storage engine, wherein each row in a TEXT column is made up of two separate parts. One of these is of fixed size (256 bytes), and is actually stored in the original table. The other consists of any data in excess of 256 bytes, which is stored in a hidden table. The rows in this second table are always 2000 bytes long. This means that the size of a TEXT column is 256 if size <= 256 (where size represents the size of the row); otherwise, the size is 256 + size + (2000 × (size − 256) % 2000).

来自:https://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html


大致意思是:TEXTBLOB列在NDB存储引擎中的实现方式不同,其中列中的每一行 TEXT由两个独立的部分组成。其中之一是固定大小(256 字节),实际上存储在原始表中。另一个包含超过 256 字节的任何数据,存储在隐藏表中。第二个表中的行总是 2000 字节长。这意味着 TEXT如果 size<= 256(其中 size表示行的大小),则列的大小为 256;否则,大小为 256 + size+ (2000 × ( size- 256) % 2000)。


到此为止,我们知道了varchar属性最大空间是65535个字节,如果是utf8编码, 那么varchar最多存65532/3 = 21844个字符。正常声明时,声明多少字符就占用原表多少字符。text理论上最多保存65535个字符,但是考虑编码肯定是存不了那么多的。但是因为实现机制的原因,text并不会占用原表那么多字符。


接下来我们进一步介绍为什么text并不会像varchar占用原表那么多空间。

MySQL行大小的限制

我们依然来看一下MySQL官网对行限制的描述:

The maximum row size for a given table is determined by several factors:

  • The internal representation of a MySQL table has a maximum row size limit of 65,535 bytes, even if the storage engine is capable of supporting larger rows. BLOBand TEXTcolumns only contribute 9 to 12 bytes toward the row size limit because their contents are stored separately from the rest of the row.

  • The maximum row size for an InnoDBtable, which applies to data stored locally within a database page, is slightly less than half a page for 4KB, 8KB, 16KB, and 32KB innodb_page_sizesettings. For example, the maximum row size is slightly less than 8KB for the default 16KB InnoDBpage size. For 64KB pages, the maximum row size is slightly less than 16KB. See Section 14.23, “InnoDB Limits”.If a row containing variable-length columnsexceeds the InnoDBmaximum row size, InnoDBselects variable-length columns for external off-page storage until the row fits within the InnoDBrow size limit. The amount of data stored locally for variable-length columns that are stored off-page differs by row format. For more information, see Section 14.11, “InnoDB Row Formats”.

  • Different storage formats use different amounts of page header and trailer data, which affects the amount of storage available for rows.

    • For information about InnoDBrow formats, see Section 14.11, “InnoDB Row Formats”.

    • For information about MyISAM storage formats, see Section 15.2.3, “MyISAM Table Storage Formats”.

来自:https://dev.mysql.com/doc/refman/5.7/en/column-count-limit.html


大致意思是:总共行的限制是  65,535 bytes 。超过这个限制的表结构就无法创建了。其中TEXT和BLOB类型的数据只占表结构体大约9到12 bytes。原因就是上面说的两部分存储。


那我们来看一下官网给予的一些例子:



结论

到此我们来总结一下,varchar和text最大占用空间都是65,535 bytes。但是他们的实现方式不同,varchar是直接占用原表的空间,而text使用隐藏表实现超多256字节长度的数据,最多只占用原表12bytes的空间。


在延伸一下,

这或许还能回答大家为什么阿里数据库开发规范中尽量少使用text的原因。

因为如果我们的数据是存放在text字段里面的,在对其进行检索的时候,根据其原理可知性能可能是跟不上varchar的。