Mysql- unicode characters are fine but not accents

时间:2016-06-01 11:52:20

标签: mysql utf-8 character-encoding

I came accross a mind puzzling problem with mysql encoding today and would appreciate ideas on how to debug that further.

I had to update an old perl application, using mysql 5.6, which originally just in English and to which I had to add some unicode support (for khmer script).

I figured it would be best to do a test install. Took a dump of the prod db, imported into a test db, changed the charset of the tables that needed support to utf8 collate utf8_unicode_cli.

All worked well so went to apply to production. Ran the sql migration scripts to change charsets, deployed the new code and ... khmer characters do store/show fine but legacy è characters show as question mark with black square.

What really puzzles me is that

  • test and prod run on the same (windows) box, same mysql server instance

  • both test and prod databases have the same charsets et collation

  • for the table in question, test and prod show create table statements are identical

  • the same code connected to test works fine but connected to prod doesn't

I thought maybe the original data got mangled in the process so deleted it and reinserting it through the app interface. Still worked on test but not prod.

Same code works on test so code is probably not the issue. Both on same server instance so probably not server config issue. Khmer script works fine so probably not a utf "configuration" issue. New data is wrongly handled so probably not a data migration/convertion issue.

So 2 questions:

  • is the question mark with black square a sign of double encoding or just wrong encoding

  • how can I debug this further? Anyway to see "raw" mysql stored data for example so I could compare?

Any input greatly appreciated.

1 个答案:

答案 0 :(得分:0)

尝试使用utf8 / utf8mb4时,如果您看到带有问号的黑钻石, 其中一种情况存在:

案例1(原始字节 utf8):

  • 要存储的字节不编码为utf8。解决这个问题。
  • SET NAMES INSERT的连接(或SELECT)不是utf8 / utf8mb4。解决这个问题。
  • 另外,检查数据库中的列是CHARACTER SET utf8(或utf8mb4)。

案例2(原始字节 utf8):

  • SET NAMES的连接(或SELECT)不是utf8 / utf8mb4。解决这个问题。
  • 另外,检查数据库中的列是CHARACTER SET utf8(或utf8mb4)。

只有在浏览器设置为<meta charset=UTF-8>

时才会出现黑钻石

不相关,但自从你提出来之后:

尝试使用utf8 / utf8mb4时,如果看到 Mojibake ,请检查以下内容。 此讨论适用于双重编码,但不一定可见。

  • 要存储的字节需要utf8编码。
  • INSERTingSELECTing文字需要指定utf8或utf8mb4时的连接。
  • 该列需要声明为CHARACTER SET utf8(或utf8mb4)。
  • HTML应以<meta charset=UTF-8>开头。