{"id":795,"date":"2021-08-13T13:05:00","date_gmt":"2021-08-13T17:05:00","guid":{"rendered":"https:\/\/molecularsciences.org\/content\/?p=795"},"modified":"2022-03-01T13:06:42","modified_gmt":"2022-03-01T18:06:42","slug":"character-sets-encoding-and-collation-in-mysql","status":"publish","type":"post","link":"https:\/\/molecularsciences.org\/content\/character-sets-encoding-and-collation-in-mysql\/","title":{"rendered":"Character sets, encoding, and collation in MySQL"},"content":{"rendered":"\n<p><strong>Character:<\/strong>&nbsp;set of symbols and encodings&nbsp;<strong>Encoding:<\/strong>&nbsp;set of values assigned to characters&nbsp;<strong>Collation:<\/strong>&nbsp;set of rule for comparing a character set<\/p>\n\n\n\n<p>Suppose you have the following characters,<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>M Y S Q L\n<\/code><\/pre>\n\n\n\n<p>that are assigned the following values:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>13 25 19 17 12\n<\/code><\/pre>\n\n\n\n<p>sequentially. These numbers happen to be their order in the English alphabet. So A = 1, B = 2 and so on.<\/p>\n\n\n\n<p>MYSQL are characters. The number are encodings. The collation is based on the position in English alphabet.<\/p>\n\n\n\n<p>MySQL has the following character sets:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql&gt; show character set;\n+----------+-----------------------------+---------------------+--------+\n| Charset  | Description                 | Default collation   | Maxlen |\n+----------+-----------------------------+---------------------+--------+\n| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |\n| dec8     | DEC West European           | dec8_swedish_ci     |      1 |\n| cp850    | DOS West European           | cp850_general_ci    |      1 |\n| hp8      | HP West European            | hp8_english_ci      |      1 |\n| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |\n| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |\n| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |\n| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |\n| ascii    | US ASCII                    | ascii_general_ci    |      1 |\n| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |\n| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |\n| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |\n| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |\n| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |\n| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |\n| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |\n| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |\n| cp1250   | Windows Central European    | cp1250_general_ci   |      1 |\n| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |\n| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |\n| armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |      1 |\n| utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |\n| ucs2     | UCS-2 Unicode               | ucs2_general_ci     |      2 |\n| cp866    | DOS Russian                 | cp866_general_ci    |      1 |\n| keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |      1 |\n| macce    | Mac Central European        | macce_general_ci    |      1 |\n| macroman | Mac West European           | macroman_general_ci |      1 |\n| cp852    | DOS Central European        | cp852_general_ci    |      1 |\n| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |\n| utf8mb4  | UTF-8 Unicode               | utf8mb4_general_ci  |      4 |\n| cp1251   | Windows Cyrillic            | cp1251_general_ci   |      1 |\n| utf16    | UTF-16 Unicode              | utf16_general_ci    |      4 |\n| cp1256   | Windows Arabic              | cp1256_general_ci   |      1 |\n| cp1257   | Windows Baltic              | cp1257_general_ci   |      1 |\n| utf32    | UTF-32 Unicode              | utf32_general_ci    |      4 |\n| binary   | Binary pseudo charset       | binary              |      1 |\n| geostd8  | GEOSTD8 Georgian            | geostd8_general_ci  |      1 |\n| cp932    | SJIS for Windows Japanese   | cp932_japanese_ci   |      2 |\n| eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |      3 |\n+----------+-----------------------------+---------------------+--------+\n<\/code><\/pre>\n\n\n\n<p>and the collation is<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql&gt; show collation;\n+--------------------------+----------+-----+---------+----------+---------+\n| Collation                | Charset  | Id  | Default | Compiled | Sortlen |\n+--------------------------+----------+-----+---------+----------+---------+\n| latin1_general_ci        | latin1   |  48 |         | Yes      |       1 |\n| utf8_general_ci          | utf8     |  33 | Yes     | Yes      |       1 |\n| utf8_bin                 | utf8     |  83 |         | Yes      |       1 |\n| utf8_unicode_ci          | utf8     | 192 |         | Yes      |       8 |\n| ucs2_general_ci          | ucs2     |  35 | Yes     | Yes      |       1 |\n| ucs2_bin                 | ucs2     |  90 |         | Yes      |       1 |\n| ucs2_unicode_ci          | ucs2     | 128 |         | Yes      |       8 |\n+--------------------------+----------+-----+---------+----------+---------+\n<\/code><\/pre>\n\n\n\n<p>My terminal listed 197. I am only showing a few.<\/p>\n\n\n\n<p>Default settings for character sets and collations are set at server level, database level, table level, and column level. These defaults can be modified at every level. To set a default character set at server level, define the settings at startup using &#8211;with-charset and &#8211;with-collation options. To modify at database level, see the following example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE DATABASE mydb CHARACTER SET utf8 COLLATE utf8_general_ci;\n<\/code><\/pre>\n\n\n\n<p>To view the character set,<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql&gt; show create database mydb;\n+----------+---------------------------------------------------------------+\n| Database | Create Database                                               |\n+----------+---------------------------------------------------------------+\n| mydb     | CREATE DATABASE `mydb` \/*!40100 DEFAULT CHARACTER SET utf8 *\/ |\n+----------+---------------------------------------------------------------+\n\nmysql&gt; show variables like \"collation_database\";\n+--------------------+-----------------+\n| Variable_name      | Value           |\n+--------------------+-----------------+\n| collation_database | utf8_general_ci |\n+--------------------+-----------------+\n\nmysql&gt; show variables like \"character_set_database\";\n+------------------------+-------+\n| Variable_name          | Value |\n+------------------------+-------+\n| character_set_database | utf8  |\n+------------------------+-------+\n<\/code><\/pre>\n\n\n\n<p>To create and view the character set and collation in a table:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql&gt; create table mytable ( \n-&gt; id int(10) not null, \n-&gt; name varchar(20) null ) character set utf8 collate utf8_unicode_ci;\n\nmysql&gt; show full columns from mytable;\n+-------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+\n| Field | Type        | Collation       | Null | Key | Default | Extra | Privileges                      | Comment |\n+-------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+\n| id    | int(10)     | NULL            | NO   |     | NULL    |       | select,insert,update,references |         |\n| name  | varchar(20) | utf8_unicode_ci | YES  |     | NULL    |       | select,insert,update,references |         |\n+-------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+\n<\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Character:&nbsp;set of symbols and encodings&nbsp;Encoding:&nbsp;set of values assigned to characters&nbsp;Collation:&nbsp;set of rule for comparing a character set Suppose you have the following characters, that are assigned the following values: sequentially. These numbers happen to be their order in the English alphabet. So A = 1, B = 2 and so on. MYSQL are characters. The [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,204],"tags":[27],"class_list":["post-795","post","type-post","status-publish","format-standard","hentry","category-database","category-mysql","tag-mysql"],"_links":{"self":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts\/795","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/comments?post=795"}],"version-history":[{"count":1,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts\/795\/revisions"}],"predecessor-version":[{"id":796,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts\/795\/revisions\/796"}],"wp:attachment":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/media?parent=795"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/categories?post=795"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/tags?post=795"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}