Question

我有一个数据库，其中包含分散在几个表中的一堆破坏的utf8字符。字符列表不是很广泛AFAIK（áéíúóÁÉÍÓÚÑñ）

修复给定的表格非常简单

update orderItem set itemName=replace(itemName,'Ã¡','á');

但我无法找到一种检测破碎角色的方法。如果我做了像

这样的事情

SELECT * FROM TABLE WHERE field LIKE "%Ã%";

由于整理（Ã= a），我得到了几乎所有的字段。到目前为止，所有破碎的字符都以“Ã”开头。数据库是西班牙语，因此不使用此特定字符

到目前为止我已经破碎的字符列表是

Ã¡ = á
Ã© = é
Ã- = í
Ã³ = ó
Ã± = ñ
Ã¡ = Á

如何让这个SELECT按预期工作？（二元搜索或类似的东西）

Answer 1

我用

修复了

UPDATE wp_zcs9ck_posts_copy SET post_title = 
    CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8);

完整解决方案：http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/

Answer 2

UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã¡','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã¤','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã³','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ãº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã±','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'â€“','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€™','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€¦','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€“','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€œ','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€˜','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€¢','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€¡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');

Answer 3

感谢您的回答!!

我用这个修复了我的表，并希望分享完整的更改列表。请注意，它还包括修复html解码的字符，除了拉丁字符，它真的是一团糟：

update `table` set `field` = replace(`field` ,'Ã‰','É');
update `table` set `field` = replace(`field` ,'â€œ','"');
update `table` set `field` = replace(`field` ,'â€','"');
update `table` set `field` = replace(`field` ,'Ã‡','Ç');
update `table` set `field` = replace(`field` ,'Ãƒ','Ã');
//Edit by slash4
update `table` set `field` = replace(`field` ,'Ã ','À');
update `table` set `field` = replace(`field` ,'Ãº','ú');
update `table` set `field` = replace(`field` ,'â€¢','-');
update `table` set `field` = replace(`field` ,'Ã˜','Ø');
update `table` set `field` = replace(`field` ,'Ãµ','õ');
-- The next one  appears to be missing a character. But which one?
update `table` set `field` = replace(`field` ,'Ã','í');
update `table` set `field` = replace(`field` ,'Ã¢','â');
update `table` set `field` = replace(`field` ,'Ã£','ã');
update `table` set `field` = replace(`field` ,'Ãª','ê');
update `table` set `field` = replace(`field` ,'Ã¡','á');
update `table` set `field` = replace(`field` ,'Ã©','é');
update `table` set `field` = replace(`field` ,'Ã³','ó');
update `table` set `field` = replace(`field` ,'â€“','–');
update `table` set `field` = replace(`field` ,'Ã§','ç');
update `table` set `field` = replace(`field` ,'Âª','ª');
update `table` set `field` = replace(`field` ,'Âº','º');
update `table` set `field` = replace(`field` ,'Ã ','à');
update `table` set `field` = replace(`field` ,'&ccedil;','ç');
update `table` set `field` = replace(`field` ,'&atilde;','ã');
update `table` set `field` = replace(`field` ,'&aacute;','á');
update `table` set `field` = replace(`field` ,'&acirc;','â');
update `table` set `field` = replace(`field` ,'&eacute;','é');
update `table` set `field` = replace(`field` ,'&iacute;','í');
update `table` set `field` = replace(`field` ,'&otilde;','õ');
update `table` set `field` = replace(`field` ,'&uacute;','ú');
update `table` set `field` = replace(`field` ,'&ccedil;','ç');
update `table` set `field` = replace(`field` ,'&Aacute;','Á');
update `table` set `field` = replace(`field` ,'&Acirc;','Â');
update `table` set `field` = replace(`field` ,'&Eacute;','É');
update `table` set `field` = replace(`field` ,'&Iacute;','Í');
update `table` set `field` = replace(`field` ,'&Otilde;','Õ');
update `table` set `field` = replace(`field` ,'&Uacute;','Ú');
update `table` set `field` = replace(`field` ,'&Ccedil;','Ç');
update `table` set `field` = replace(`field` ,'&Atilde;','Ã');
update `table` set `field` = replace(`field` ,'&Agrave;','À');
update `table` set `field` = replace(`field` ,'&Ecirc;','Ê');
update `table` set `field` = replace(`field` ,'&Oacute;','Ó');
update `table` set `field` = replace(`field` ,'&Ocirc;','Ô');
update `table` set `field` = replace(`field` ,'&Uuml;','Ü');
update `table` set `field` = replace(`field` ,'&atilde;','ã');
update `table` set `field` = replace(`field` ,'&agrave;','à');
update `table` set `field` = replace(`field` ,'&ecirc;','ê');
update `table` set `field` = replace(`field` ,'&oacute;','ó');
update `table` set `field` = replace(`field` ,'&ocirc;','ô');
update `table` set `field` = replace(`field` ,'&uuml;','ü');
update `table` set `field` = replace(`field` ,'&amp;','&');
update `table` set `field` = replace(`field` ,'&gt;','>');
update `table` set `field` = replace(`field` ,'&lt;','<');
update `table` set `field` = replace(`field` ,'&circ;','ˆ');
update `table` set `field` = replace(`field` ,'&tilde;','˜');
update `table` set `field` = replace(`field` ,'&uml;','¨');
update `table` set `field` = replace(`field` ,'&cute;','´');
update `table` set `field` = replace(`field` ,'&cedil;','¸');
update `table` set `field` = replace(`field` ,'&quot;','"');
update `table` set `field` = replace(`field` ,'&ldquo;','“');
update `table` set `field` = replace(`field` ,'&rdquo;','”');
update `table` set `field` = replace(`field` ,'&lsquo;','‘');
update `table` set `field` = replace(`field` ,'&rsquo;','’');
update `table` set `field` = replace(`field` ,'&lsaquo;','‹');
update `table` set `field` = replace(`field` ,'&rsaquo;','›');
update `table` set `field` = replace(`field` ,'&laquo;','«');
update `table` set `field` = replace(`field` ,'&raquo;','»');
update `table` set `field` = replace(`field` ,'&ordm;','º');
update `table` set `field` = replace(`field` ,'&ordf;','ª');
update `table` set `field` = replace(`field` ,'&ndash;','–');
update `table` set `field` = replace(`field` ,'&mdash;','—');
update `table` set `field` = replace(`field` ,'&macr;','¯');
update `table` set `field` = replace(`field` ,'&hellip;','…');
update `table` set `field` = replace(`field` ,'&brvbar;','¦');
update `table` set `field` = replace(`field` ,'&bull;','•');
update `table` set `field` = replace(`field` ,'&para;','¶');
update `table` set `field` = replace(`field` ,'&sect;','§');
update `table` set `field` = replace(`field` ,'&sup1;','¹');
update `table` set `field` = replace(`field` ,'&sup2;','²');
update `table` set `field` = replace(`field` ,'&sup3;','³');
update `table` set `field` = replace(`field` ,'&frac12;','½');
update `table` set `field` = replace(`field` ,'&frac14;','¼');
update `table` set `field` = replace(`field` ,'&frac34;','¾');
update `table` set `field` = replace(`field` ,'&#8539;','⅛');
update `table` set `field` = replace(`field` ,'&#8540;','⅜');
update `table` set `field` = replace(`field` ,'&#8541;','⅝');
update `table` set `field` = replace(`field` ,'&#8542;','⅞');
update `table` set `field` = replace(`field` ,'&gt;','>');
update `table` set `field` = replace(`field` ,'&lt;','<');
update `table` set `field` = replace(`field` ,'&plusmn;','±');
update `table` set `field` = replace(`field` ,'&minus;','−');
update `table` set `field` = replace(`field` ,'&times;','×');
update `table` set `field` = replace(`field` ,'&divide;','÷');
update `table` set `field` = replace(`field` ,'&lowast;','∗');
update `table` set `field` = replace(`field` ,'&frasl;','⁄');
update `table` set `field` = replace(`field` ,'&permil;','‰');
update `table` set `field` = replace(`field` ,'&int;','∫');
update `table` set `field` = replace(`field` ,'&sum;','∑');
update `table` set `field` = replace(`field` ,'&prod;','∏');
update `table` set `field` = replace(`field` ,'&radic;','√');
update `table` set `field` = replace(`field` ,'&infin;','∞');
update `table` set `field` = replace(`field` ,'&asymp;','≈');
update `table` set `field` = replace(`field` ,'&cong;','≅');
update `table` set `field` = replace(`field` ,'&prop;','∝');
update `table` set `field` = replace(`field` ,'&equiv;','≡');
update `table` set `field` = replace(`field` ,'&ne;','≠');
update `table` set `field` = replace(`field` ,'&le;','≤');
update `table` set `field` = replace(`field` ,'&ge;','≥');
update `table` set `field` = replace(`field` ,'&there4;','∴');
update `table` set `field` = replace(`field` ,'&sdot;','⋅');
update `table` set `field` = replace(`field` ,'&middot;','·');
update `table` set `field` = replace(`field` ,'&part;','∂');
update `table` set `field` = replace(`field` ,'&image;','ℑ');
update `table` set `field` = replace(`field` ,'&real;','ℜ');
update `table` set `field` = replace(`field` ,'&prime;','′');
update `table` set `field` = replace(`field` ,'&Prime;','″');
update `table` set `field` = replace(`field` ,'&deg;','°');
update `table` set `field` = replace(`field` ,'&ang;','∠');
update `table` set `field` = replace(`field` ,'&perp;','⊥');
update `table` set `field` = replace(`field` ,'&nabla;','∇');
update `table` set `field` = replace(`field` ,'&oplus;','⊕');
update `table` set `field` = replace(`field` ,'&otimes;','⊗');
update `table` set `field` = replace(`field` ,'&alefsym;','ℵ');
update `table` set `field` = replace(`field` ,'&oslash;','ø');
update `table` set `field` = replace(`field` ,'&Oslash;','Ø');
update `table` set `field` = replace(`field` ,'&isin;','∈');
update `table` set `field` = replace(`field` ,'&notin;','∉');
update `table` set `field` = replace(`field` ,'&cap;','∩');
update `table` set `field` = replace(`field` ,'&cup;','∪');
update `table` set `field` = replace(`field` ,'&sub;','⊂');
update `table` set `field` = replace(`field` ,'&sup;','⊃');
update `table` set `field` = replace(`field` ,'&sube;','⊆');
update `table` set `field` = replace(`field` ,'&supe;','⊇');
update `table` set `field` = replace(`field` ,'&exist;','∃');
update `table` set `field` = replace(`field` ,'&forall;','∀');
update `table` set `field` = replace(`field` ,'&empty;','∅');
update `table` set `field` = replace(`field` ,'&not;','¬');
update `table` set `field` = replace(`field` ,'&and;','∧');
update `table` set `field` = replace(`field` ,'&or;','∨');
update `table` set `field` = replace(`field` ,'&crarr;','↵');

Answer 4

没有文字替换是一种通用的解决方案，因为你可以忘记一些角色。对双转换字符更合适的修复方法是：

转换回latin1

转换为二进制

转换为utf8

像这样：

alter table descriptions modify name VARCHAR(2000) character set latin1; alter table descriptions modify name blob; alter table descriptions modify name VARCHAR(2000) character set utf8;

Answer 5

您需要的SELECT语句如下：

SELECT * FROM TABLE WHERE LENGTH(name) != CHAR_LENGTH(name);

返回包含多字节字符的所有行。

假设{p> name是一个字段/可以找到奇怪字符的字段。 *

Answer 6

这拯救了我的生命

UPDATE ohp_posts SET post_content = CONVERT(CAST(CONVERT(post_content USING latin1) AS BINARY) USING utf8)

我在http://stanis.net/2014/04/replacing-latin-1-with-utf-8-characters-in-mysql/

找到了它

Answer 7

如何使用不同的方法，即来回转换列以获得正确的字符集？您可以将其转换为二进制，然后转换为utf-8，然后转换为iso-8859-1或您正在使用的任何其他内容。有关详细信息，请参阅manual。

Answer 8

除了RaúlAvilaSolano和acseven的答案，如果你想更新一个查询中所有破碎的字符，你可以这样做：

update `table` set field = replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(field,'&uuml;','ü'),'&ocirc;','ô'),'&oacute;','ó'),'&ecirc;','ê'),'&agrave;','à'),'&atilde;','ã'),'&Uuml;','Ü'),'&Ocirc;','Ô'),'&Oacute;','Ó'),'&Ecirc;','Ê'),'&Agrave;','À'),'&Atilde;','Ã'),'&Ccedil;','Ç'),'&Uacute;','Ú'),'&Otilde;','Õ'),'&Iacute;','Í'),'&Iacute;','Í'),'&Eacute;','É'),'&Acirc;','Â'),'&Aacute;','Á'),'&ccedil;','ç'),'&uacute;','ú'),'&otilde;','õ'),'&iacute;','í'),'&eacute;','é'),'&acirc;','â'),'&aacute;','á'),'&atilde;','ã'),'&ccedil;','ç'),'Ã ','à'),'Ã ','à'),'Âº','º'),'Âª','ª'),'Ã§','ç'),'â€“','–'),'Ã³','ó'),'Ã©','é'),'Ã¡','á'),'Ãª','ê'),'Ã£','ã'),'Ã¢','â'),'Ã','í'),'Ãµ','õ'),'Ã˜','Ø'),'â€¢','-'),'Ãº','ú'),'Ã ','À'),'Ãƒ','Ã'),'Ã‡','Ç'),'â€','"'),'â€œ','"'),'Ã‰','É');

Answer 9

这也解决了我在一些意大利人的问题上的问题

UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã¡','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã¤','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã³','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ãº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã±','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'â€“','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€™','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€¦','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€“','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€œ','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€˜','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€¢','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€¡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í ','à');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¨','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íˆ','È');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'â‚¬','€');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'eÌ€','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í²','ò');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¹','ù');

Answer 10

我有同样的问题，但不喜欢replace（）解决方案，因为总有可能错过一些字符。我正在反对一个混合数据的列（一些是utf8_encode（）d而另一些没有），有400万行左右，大约250k记录，错误编码数据（带有‰/ etc字符），涵盖约15种国际语言，主要包括欧洲语言，俄语，日语和中文。

我开始复制该列，因为我不想丢失任何数据：

ALTER TABLE images ADD COLUMN reptitle TEXT;

使用多字节字符复制所有数据（感谢Adam提示）

UPDATE images SET reptitle = title WHERE LENGTH(title) != CHAR_LENGTH(title)

由于reptitle是使用表的默认字符集创建的，因此它已经是utf8，但包含损坏的数据，因为images表曾经是拉丁语源。列reptitle现在包含一些正确编码的数据，有些已损坏（所有值都带有多字节字符，有些已经正确编码了utf8_encode（）d。那么大卫的提示......

ALTER TABLE images MODIFY reptitle TEXT character set latin1;
ALTER TABLE images MODIFY reptitle BLOB;
ALTER TABLE images MODIFY reptitle TEXT character set utf8;

由于TEXT和BLOB（我认为）是相同的，因此可能没有必要使用中间步骤。这具有纠正所有错误编码数据的效果（'tudiantes'成为'étudiantes'等），但先前正确的数据在第一个多字节字符处被截断（'LapindePâques'成为'Lapin de P'）。我不知道截断的原因，但它是在一次性色谱柱中所以我并不在意。截断的数据给出了相同值的CHAR_LENGTH和LENGTH，因为没有多字节字符，所以很容易查询......

UPDATE images SET title = reptitle WHERE LENGTH(reptitle)!=CHAR_LENGTH(reptitle)

然后当然只需删除备用列

ALTER TABLE images DROP COLUMN reptitle

另外请确保（因为我使用PHP，这让我惹了几次，所以我想我在这里提到它）所有的脚本文件都是UTF8（没有BOM）而你正在使用：

mysql_set_charset('utf8', $connection);

Etvoilà...完美修复的数据，所有语言：）

Answer 11

您可能拥有包含正确编码的UTF8且编码错误的行。在这种情况下，“CONVERT（BINARY CONVERT（post_title USING latin1）USING utf8）”将修剪一些字段。

我最终这样做了

update `table` set `name` = replace(`name` ,CONVERT(BINARY "ä" USING latin1),'ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ö" USING latin1),'ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ü" USING latin1),'ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ä" USING latin1),'Ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ö" USING latin1),'Ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ü" USING latin1),'Ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ß" USING latin1),'ß');

Answer 12

根据这篇文章https://www.i18nqa.com/debug/utf8-debug.html中的数据，我建议这是一个很好的查询，用于识别不可靠的条目和可能的正确值：

SELECT my_field,CONVERT(BINARY CONVERT(my_field USING latin1) USING utf8mb4) AS new_field_value FROM my_table WHERE my_field REGEXP '[âÆËÅÂÃ]';

请非常小心，因为我们对文件名的编码不正确，但是对路径的编码却不错，在这种情况下，上面的某些解决方案可能会引起很多麻烦。如果您的某些数据已经用UTF8正确编码，您可能会发现丢失了一部分数据。

Answer 13

整个数据库中有a nice script to automate the conversion process。知道MySQL的UTF-8实现是不完整的也很有用，因为它只支持最多3个字节的UTF-8字符。解决方案是使用MySQL 5.5.3中引入的utf8mb4字符集。

Answer 14

由于TEXT和BLOB相同，因此可能没有必要使用中间步骤。

这具有纠正所有错误编码数据的效果，但先前正确的数据在第一个多字节字符处被截断。

Answer 15

这是@Thales Ceolin的答案的扩展，以便修改数据库中的每个表：

select concat(
    "update ", 
    a.TABLE_NAME, 
    " set ", b.COLUMN_NAME, 
    " = CONVERT(BINARY CONVERT(", 
    b.COLUMN_NAME, 
    " USING latin1) USING utf8) where ",
    b.COLUMN_NAME, 
    " is not null;") query
from INFORMATION_SCHEMA.TABLES a
left join INFORMATION_SCHEMA.COLUMNS b on a.TABLE_NAME = b.TABLE_NAME
where a.table_schema = 'db_name'
and a.TABLE_TYPE = 'BASE TABLE'
and b.data_type in ('text', 'varchar')
and a.TABLE_NAME = 'table_name';

这将导致：

update table_name set idn = CONVERT(BINARY CONVERT(idn USING latin1) USING utf8) where idn is not null;
update table_nameset name = CONVERT(BINARY CONVERT(name USING latin1) USING utf8) where name is not null;
update table_name set primary_last_name = CONVERT(BINARY CONVERT(primary_last_name USING latin1) USING utf8) where primary_last_name is not null;

Answer 16

由于主要问题是在检测损坏的字符时，我的解决方案是：（以防止正常的字符集双重编码）

检测（从latin1到utf8）

SELECT name FROM %table% 
 WHERE 
CONVERT(CONVERT(name USING BINARY) USING utf8 ) != CONVERT(CONVERT(CONVERT(CONVERT(name USING BINARY) USING latin1) USING BINARY) USING utf8);

更新（从latin1到utf8）

UPDATE %table% SET name = convert(cast(convert(name using latin1 ) as binary) using utf8 )
 WHERE 
CONVERT(CONVERT(name USING BINARY) USING utf8 ) != CONVERT(CONVERT(CONVERT(CONVERT(name USING BINARY) USING latin1) USING BINARY) USING utf8);

Answer 17

此查询帮助我确定其中包含错误字符的行。基本上，您可以确定该字段不为空的位置，然后转换为UTF8并在转换后检查是否为空。

select ach.*
from ach_warehouse ach
where addendum is not null and convert(addendum using utf8) is null;

Answer 18

要将所有拉丁字符转换为正确的重音符号，请在MySQL上尝试：

UPDATE your_table SET your_column = CONVERT(CAST(CONVERT(your_column USING latin1) AS BINARY) USING utf8)

在MySQL中检测utf8中断的字符

18 个答案: