我有一个数据库,其中包含分散在几个表中的一堆破坏的utf8字符。 字符列表不是很广泛AFAIK(áéíúóÁÉÍÓÚÑñ)
修复给定的表格非常简单
update orderItem set itemName=replace(itemName,'á','á');
但我无法找到一种检测破碎角色的方法。如果我做了像
这样的事情SELECT * FROM TABLE WHERE field LIKE "%Ã%";
由于整理(Ã= a),我得到了几乎所有的字段。到目前为止,所有破碎的字符都以“Ô开头。数据库是西班牙语,因此不使用此特定字符
到目前为止我已经破碎的字符列表是
á = á
é = é
Ã- = í
ó = ó
ñ = ñ
á = Á
如何让这个SELECT按预期工作? (二元搜索或类似的东西)
答案 0 :(得分:52)
我用
修复了UPDATE wp_zcs9ck_posts_copy SET post_title =
CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8);
完整解决方案:http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/
答案 1 :(得分:35)
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
答案 2 :(得分:12)
感谢您的回答!!
我用这个修复了我的表,并希望分享完整的更改列表。请注意,它还包括修复html解码的字符,除了拉丁字符,它真的是一团糟:
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'“','"');
update `table` set `field` = replace(`field` ,'â€','"');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
//Edit by slash4
update `table` set `field` = replace(`field` ,'Ã ','À');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'•','-');
update `table` set `field` = replace(`field` ,'Ø','Ø');
update `table` set `field` = replace(`field` ,'õ','õ');
-- The next one appears to be missing a character. But which one?
update `table` set `field` = replace(`field` ,'Ã','í');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'–','–');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ª','ª');
update `table` set `field` = replace(`field` ,'º','º');
update `table` set `field` = replace(`field` ,'Ã ','à');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'í','í');
update `table` set `field` = replace(`field` ,'õ','õ');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'Á','Á');
update `table` set `field` = replace(`field` ,'Â','Â');
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'Í','Í');
update `table` set `field` = replace(`field` ,'Õ','Õ');
update `table` set `field` = replace(`field` ,'Ú','Ú');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
update `table` set `field` = replace(`field` ,'À','À');
update `table` set `field` = replace(`field` ,'Ê','Ê');
update `table` set `field` = replace(`field` ,'Ó','Ó');
update `table` set `field` = replace(`field` ,'Ô','Ô');
update `table` set `field` = replace(`field` ,'Ü','Ü');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'à','à');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'ô','ô');
update `table` set `field` = replace(`field` ,'ü','ü');
update `table` set `field` = replace(`field` ,'&','&');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'ˆ','ˆ');
update `table` set `field` = replace(`field` ,'˜','˜');
update `table` set `field` = replace(`field` ,'¨','¨');
update `table` set `field` = replace(`field` ,'&cute;','´');
update `table` set `field` = replace(`field` ,'¸','¸');
update `table` set `field` = replace(`field` ,'"','"');
update `table` set `field` = replace(`field` ,'“','“');
update `table` set `field` = replace(`field` ,'”','”');
update `table` set `field` = replace(`field` ,'‘','‘');
update `table` set `field` = replace(`field` ,'’','’');
update `table` set `field` = replace(`field` ,'‹','‹');
update `table` set `field` = replace(`field` ,'›','›');
update `table` set `field` = replace(`field` ,'«','«');
update `table` set `field` = replace(`field` ,'»','»');
update `table` set `field` = replace(`field` ,'º','º');
update `table` set `field` = replace(`field` ,'ª','ª');
update `table` set `field` = replace(`field` ,'–','–');
update `table` set `field` = replace(`field` ,'—','—');
update `table` set `field` = replace(`field` ,'¯','¯');
update `table` set `field` = replace(`field` ,'…','…');
update `table` set `field` = replace(`field` ,'¦','¦');
update `table` set `field` = replace(`field` ,'•','•');
update `table` set `field` = replace(`field` ,'¶','¶');
update `table` set `field` = replace(`field` ,'§','§');
update `table` set `field` = replace(`field` ,'¹','¹');
update `table` set `field` = replace(`field` ,'²','²');
update `table` set `field` = replace(`field` ,'³','³');
update `table` set `field` = replace(`field` ,'½','½');
update `table` set `field` = replace(`field` ,'¼','¼');
update `table` set `field` = replace(`field` ,'¾','¾');
update `table` set `field` = replace(`field` ,'⅛','⅛');
update `table` set `field` = replace(`field` ,'⅜','⅜');
update `table` set `field` = replace(`field` ,'⅝','⅝');
update `table` set `field` = replace(`field` ,'⅞','⅞');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'±','±');
update `table` set `field` = replace(`field` ,'−','−');
update `table` set `field` = replace(`field` ,'×','×');
update `table` set `field` = replace(`field` ,'÷','÷');
update `table` set `field` = replace(`field` ,'∗','∗');
update `table` set `field` = replace(`field` ,'⁄','⁄');
update `table` set `field` = replace(`field` ,'‰','‰');
update `table` set `field` = replace(`field` ,'∫','∫');
update `table` set `field` = replace(`field` ,'∑','∑');
update `table` set `field` = replace(`field` ,'∏','∏');
update `table` set `field` = replace(`field` ,'√','√');
update `table` set `field` = replace(`field` ,'∞','∞');
update `table` set `field` = replace(`field` ,'≈','≈');
update `table` set `field` = replace(`field` ,'≅','≅');
update `table` set `field` = replace(`field` ,'∝','∝');
update `table` set `field` = replace(`field` ,'≡','≡');
update `table` set `field` = replace(`field` ,'≠','≠');
update `table` set `field` = replace(`field` ,'≤','≤');
update `table` set `field` = replace(`field` ,'≥','≥');
update `table` set `field` = replace(`field` ,'∴','∴');
update `table` set `field` = replace(`field` ,'⋅','⋅');
update `table` set `field` = replace(`field` ,'·','·');
update `table` set `field` = replace(`field` ,'∂','∂');
update `table` set `field` = replace(`field` ,'ℑ','ℑ');
update `table` set `field` = replace(`field` ,'ℜ','ℜ');
update `table` set `field` = replace(`field` ,'′','′');
update `table` set `field` = replace(`field` ,'″','″');
update `table` set `field` = replace(`field` ,'°','°');
update `table` set `field` = replace(`field` ,'∠','∠');
update `table` set `field` = replace(`field` ,'⊥','⊥');
update `table` set `field` = replace(`field` ,'∇','∇');
update `table` set `field` = replace(`field` ,'⊕','⊕');
update `table` set `field` = replace(`field` ,'⊗','⊗');
update `table` set `field` = replace(`field` ,'ℵ','ℵ');
update `table` set `field` = replace(`field` ,'ø','ø');
update `table` set `field` = replace(`field` ,'Ø','Ø');
update `table` set `field` = replace(`field` ,'∈','∈');
update `table` set `field` = replace(`field` ,'∉','∉');
update `table` set `field` = replace(`field` ,'∩','∩');
update `table` set `field` = replace(`field` ,'∪','∪');
update `table` set `field` = replace(`field` ,'⊂','⊂');
update `table` set `field` = replace(`field` ,'⊃','⊃');
update `table` set `field` = replace(`field` ,'⊆','⊆');
update `table` set `field` = replace(`field` ,'⊇','⊇');
update `table` set `field` = replace(`field` ,'∃','∃');
update `table` set `field` = replace(`field` ,'∀','∀');
update `table` set `field` = replace(`field` ,'∅','∅');
update `table` set `field` = replace(`field` ,'¬','¬');
update `table` set `field` = replace(`field` ,'∧','∧');
update `table` set `field` = replace(`field` ,'∨','∨');
update `table` set `field` = replace(`field` ,'↵','↵');
答案 3 :(得分:12)
没有文字替换是一种通用的解决方案,因为你可以忘记一些角色。 对双转换字符更合适的修复方法是:
像这样:
alter table descriptions modify name VARCHAR(2000) character set latin1;
alter table descriptions modify name blob;
alter table descriptions modify name VARCHAR(2000) character set utf8;
答案 4 :(得分:11)
您需要的SELECT
语句如下:
SELECT * FROM TABLE WHERE LENGTH(name) != CHAR_LENGTH(name);
返回包含多字节字符的所有行。
假设{p>name
是一个字段/可以找到奇怪字符的字段。 *
答案 5 :(得分:10)
这拯救了我的生命
UPDATE ohp_posts SET post_content = CONVERT(CAST(CONVERT(post_content USING latin1) AS BINARY) USING utf8)
我在http://stanis.net/2014/04/replacing-latin-1-with-utf-8-characters-in-mysql/
找到了它答案 6 :(得分:6)
如何使用不同的方法,即来回转换列以获得正确的字符集?您可以将其转换为二进制,然后转换为utf-8,然后转换为iso-8859-1或您正在使用的任何其他内容。有关详细信息,请参阅manual。
答案 7 :(得分:2)
除了RaúlAvilaSolano和acseven的答案,如果你想更新一个查询中所有破碎的字符,你可以这样做:
update `table` set field = replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(field,'ü','ü'),'ô','ô'),'ó','ó'),'ê','ê'),'à','à'),'ã','ã'),'Ü','Ü'),'Ô','Ô'),'Ó','Ó'),'Ê','Ê'),'À','À'),'Ã','Ã'),'Ç','Ç'),'Ú','Ú'),'Õ','Õ'),'Í','Í'),'Í','Í'),'É','É'),'Â','Â'),'Á','Á'),'ç','ç'),'ú','ú'),'õ','õ'),'í','í'),'é','é'),'â','â'),'á','á'),'ã','ã'),'ç','ç'),'à ','à'),'à ','à'),'º','º'),'ª','ª'),'ç','ç'),'–','–'),'ó','ó'),'é','é'),'á','á'),'ê','ê'),'ã','ã'),'â','â'),'Ã','í'),'õ','õ'),'Ø','Ø'),'•','-'),'ú','ú'),'à ','À'),'Ã','Ã'),'Ç','Ç'),'â€','"'),'“','"'),'É','É');
答案 8 :(得分:2)
这也解决了我在一些意大利人的问题上的问题
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í ','à');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¨','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íˆ','È');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'€','€');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'eÌ€','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í²','ò');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¹','ù');
答案 9 :(得分:2)
我有同样的问题,但不喜欢replace()解决方案,因为总有可能错过一些字符。我正在反对一个混合数据的列(一些是utf8_encode()d而另一些没有),有400万行左右,大约250k记录,错误编码数据(带有‰/ etc字符),涵盖约15种国际语言,主要包括欧洲语言,俄语,日语和中文。
我开始复制该列,因为我不想丢失任何数据:
ALTER TABLE images ADD COLUMN reptitle TEXT;
使用多字节字符复制所有数据(感谢Adam提示)
UPDATE images SET reptitle = title WHERE LENGTH(title) != CHAR_LENGTH(title)
由于reptitle是使用表的默认字符集创建的,因此它已经是utf8,但包含损坏的数据,因为images表曾经是拉丁语源。 列reptitle现在包含一些正确编码的数据,有些已损坏(所有值都带有多字节字符,有些已经正确编码了utf8_encode()d。那么大卫的提示......
ALTER TABLE images MODIFY reptitle TEXT character set latin1;
ALTER TABLE images MODIFY reptitle BLOB;
ALTER TABLE images MODIFY reptitle TEXT character set utf8;
由于TEXT和BLOB(我认为)是相同的,因此可能没有必要使用中间步骤。 这具有纠正所有错误编码数据的效果('tudiantes'成为'étudiantes'等),但先前正确的数据在第一个多字节字符处被截断('LapindePâques'成为'Lapin de P')。我不知道截断的原因,但它是在一次性色谱柱中所以我并不在意。 截断的数据给出了相同值的CHAR_LENGTH和LENGTH,因为没有多字节字符,所以很容易查询......
UPDATE images SET title = reptitle WHERE LENGTH(reptitle)!=CHAR_LENGTH(reptitle)
然后当然只需删除备用列
ALTER TABLE images DROP COLUMN reptitle
另外请确保(因为我使用PHP,这让我惹了几次,所以我想我在这里提到它)所有的脚本文件都是UTF8(没有BOM)而你正在使用:
mysql_set_charset('utf8', $connection);
Etvoilà...完美修复的数据,所有语言:)
答案 10 :(得分:1)
您可能拥有包含正确编码的UTF8且编码错误的行。在这种情况下,“CONVERT(BINARY CONVERT(post_title USING latin1)USING utf8)”将修剪一些字段。
我最终这样做了
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ä" USING latin1),'ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ö" USING latin1),'ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ü" USING latin1),'ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ä" USING latin1),'Ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ö" USING latin1),'Ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ü" USING latin1),'Ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ß" USING latin1),'ß');
答案 11 :(得分:1)
根据这篇文章https://www.i18nqa.com/debug/utf8-debug.html中的数据,我建议这是一个很好的查询,用于识别不可靠的条目和可能的正确值:
SELECT my_field,CONVERT(BINARY CONVERT(my_field USING latin1) USING utf8mb4) AS new_field_value FROM my_table WHERE my_field REGEXP '[âÆËÅÂÃ]';
请非常小心,因为我们对文件名的编码不正确,但是对路径的编码却不错,在这种情况下,上面的某些解决方案可能会引起很多麻烦。如果您的某些数据已经用UTF8正确编码,您可能会发现丢失了一部分数据。
答案 12 :(得分:0)
整个数据库中有a nice script to automate the conversion process。知道MySQL的UTF-8实现是不完整的也很有用,因为它只支持最多3个字节的UTF-8字符。解决方案是使用MySQL 5.5.3中引入的utf8mb4字符集。
答案 13 :(得分:0)
由于TEXT
和BLOB
相同,因此可能没有必要使用中间步骤。
这具有纠正所有错误编码数据的效果,但先前正确的数据在第一个多字节字符处被截断。
答案 14 :(得分:0)
这是@Thales Ceolin的答案的扩展,以便修改数据库中的每个表:
select concat(
"update ",
a.TABLE_NAME,
" set ", b.COLUMN_NAME,
" = CONVERT(BINARY CONVERT(",
b.COLUMN_NAME,
" USING latin1) USING utf8) where ",
b.COLUMN_NAME,
" is not null;") query
from INFORMATION_SCHEMA.TABLES a
left join INFORMATION_SCHEMA.COLUMNS b on a.TABLE_NAME = b.TABLE_NAME
where a.table_schema = 'db_name'
and a.TABLE_TYPE = 'BASE TABLE'
and b.data_type in ('text', 'varchar')
and a.TABLE_NAME = 'table_name';
这将导致:
update table_name set idn = CONVERT(BINARY CONVERT(idn USING latin1) USING utf8) where idn is not null;
update table_nameset name = CONVERT(BINARY CONVERT(name USING latin1) USING utf8) where name is not null;
update table_name set primary_last_name = CONVERT(BINARY CONVERT(primary_last_name USING latin1) USING utf8) where primary_last_name is not null;
答案 15 :(得分:0)
由于主要问题是在检测损坏的字符时,我的解决方案是: (以防止正常的字符集双重编码)
SELECT name FROM %table%
WHERE
CONVERT(CONVERT(name USING BINARY) USING utf8 ) != CONVERT(CONVERT(CONVERT(CONVERT(name USING BINARY) USING latin1) USING BINARY) USING utf8);
UPDATE %table% SET name = convert(cast(convert(name using latin1 ) as binary) using utf8 )
WHERE
CONVERT(CONVERT(name USING BINARY) USING utf8 ) != CONVERT(CONVERT(CONVERT(CONVERT(name USING BINARY) USING latin1) USING BINARY) USING utf8);
答案 16 :(得分:0)
此查询帮助我确定其中包含错误字符的行。 基本上,您可以确定该字段不为空的位置,然后转换为UTF8并在转换后检查是否为空。
select ach.*
from ach_warehouse ach
where addendum is not null and convert(addendum using utf8) is null;
答案 17 :(得分:0)
要将所有拉丁字符转换为正确的重音符号,请在MySQL上尝试:
UPDATE your_table SET your_column = CONVERT(CAST(CONVERT(your_column USING latin1) AS BINARY) USING utf8)