无效的utf8字符串。将“ latin1_german1_ci”列完全转换为UTF8

时间:2019-09-06 15:05:56

标签: mysql utf-8 collation iso-8859-1 character-set

我有一个表,该表的列的数据似乎不是UTF8。我想将该列转换为UTF8。

我发现了这个精彩的教程:https://coderwall.com/p/gjyuwg/mysql-convert-encoding-to-utf8-without-garbled-data

但是,这些解决方案都无法真正起作用。

当我这样做

UPDATE vbpmtext 
SET message = @txt 
WHERE char_length(message) =  LENGTH(@txt := CONVERT(BINARY CONVERT(message USING latin1) USING utf8));

我收到很多这样的错误:

Invalid utf8 character string: 'FC6265'

具有不同的“字符串”(FC6265只是一个示例)。

有什么办法可以挽救这些数据?

我们要说的列自然采用latin1_german1_ci归类的格式。

2 个答案:

答案 0 :(得分:1)

解释为latin1的übe@txt。 (用于cp1250,cp1256,cp1257,dec8,latin2,latin5,latin7的同上。)

übe是3个字符的字符串FC6265吗?还是6个字符的字符串mysql> SET @in := UNHEX('FC6265'); mysql> SELECT HEX(@in); +----------+ | HEX(@in) | +----------+ | FC6265 | +----------+ mysql> SELECT HEX( CONVERT(@in USING latin1) ); +----------------------------------+ | HEX( CONVERT(@in USING latin1) ) | +----------------------------------+ | FC6265 | +----------------------------------+ mysql> SELECT HEX( BINARY(CONVERT(@in USING latin1)) ); +------------------------------------------+ | HEX( BINARY(CONVERT(@in USING latin1)) ) | +------------------------------------------+ | FC6265 | +------------------------------------------+ mysql> SELECT HEX( CONVERT(BINARY CONVERT(@in USING latin1) USING utf8) ); +-------------------------------------------------------------+ | HEX( CONVERT(BINARY CONVERT(@in USING latin1) USING utf8) ) | +-------------------------------------------------------------+ | | +-------------------------------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +---------+------+-----------------------------------------+ | Level | Code | Message | +---------+------+-----------------------------------------+ | Warning | 1300 | Invalid utf8 character string: 'FC6265' | +---------+------+-----------------------------------------+

BINARY()

使用utf8可以消除对当前字符串进行编码的任何假设。因此,它采用最简单的方法,并假定字符串已经为mysql> SELECT CONVERT(CONVERT(@in USING latin1) USING utf8); +-----------------------------------------------+ | CONVERT(CONVERT(@in USING latin1) USING utf8) | +-----------------------------------------------+ | übe | +-----------------------------------------------+

这可能是最短的方法:

mysql> CREATE TABLE ube ( c VARCHAR(8) CHARSET latin1 COLLATE latin1_german1_ci );

mysql> INSERT INTO ube (c) VALUES (UNHEX('FC6265'));

mysql> SELECT HEX(c) FROM ube;
+--------+
| HEX(c) |
+--------+
| FC6265 |  -- Note the latin1 encoding
+--------+

mysql> ALTER TABLE ube CONVERT TO CHARACTER SET utf8mb4;
Query OK, 1 row affected (0.04 sec)
Records: 1  Duplicates: 0  Warnings: 0   -- Note: no errors

mysql> SELECT HEX(c) FROM ube;
+----------+
| HEX(c)   |
+----------+
| C3BC6265 |   -- Now utf8mb4 encoding
+----------+

但是……该列的字符集是什么?如果是latin1,则说明情况更糟。在没有进行进一步测试之前,请不要进行任何更改。 Here有几种情况,每种情况都有解决方法。在弄清楚是否有这种情况之前,不要急于解决。您可能会使情况变得更糟。另请参见Trouble with UTF-8 characters; what I see is not what I stored

示例

public class MyDialog extends AppCompatDialogFragment {
public static final String DIALOG_TAG = "Dialog window";
private MyDialogListener listener;
private DialogOption dialogOption;

public MyDialog(DialogOption dialogOption) {
    this.dialogOption = dialogOption;
}

@NonNull
@Override
public android.app.Dialog onCreateDialog(@Nullable Bundle savedInstanceState) {
    AlertDialog.Builder builder = new AlertDialog.Builder(getActivity());
    LayoutInflater inflater = getActivity().getLayoutInflater();
    View view;

    switch (dialogOption) {
        case DIALOG_ADD:
            view = inflater.inflate(R.layout.dialog_addtheme, null);
            final EditText editTextThemeName = view.findViewById(R.id.editText_themeName);

            builder.setView(view)
                    .setTitle("Add Theme")
                    .setNegativeButton("Cancel", new DialogInterface.OnClickListener() {
                        @Override
                        public void onClick(DialogInterface dialogInterface, int i) {
                        }
                    })
                    .setPositiveButton("Add", new DialogInterface.OnClickListener() {
                        @Override
                        public void onClick(DialogInterface dialogInterface, int i) {
                            String stringThemeName = editTextThemeName.getText().toString();
                            listener.addTheme(new Theme(stringThemeName));
                        }
                    });
            break;

        case DIALOG_DELETE:
            view = inflater.inflate(R.layout.dialog_deletetheme, null);

            builder.setView(view)
                    .setTitle("Delete themes")
                    .setNegativeButton("Cancel", new DialogInterface.OnClickListener() {
                        @Override
                        public void onClick(DialogInterface dialogInterface, int i) {

                        }
                    })
                    .setPositiveButton("Delete", new DialogInterface.OnClickListener() {
                        @Override
                        public void onClick(DialogInterface dialogInterface, int i) {
                            LinkedList<Theme> toDelete = (LinkedList<Theme>) getArguments().getSerializable("To delete");
                            listener.removeThemes(toDelete);
                        }
                    });
            break;

        case DIALOG_EDIT:
    }
    return builder.create();
}

@Override
public void onAttach(@NonNull Context context) {
    super.onAttach(context);
    listener = (MyDialogListener) context;
}

public interface MyDialogListener {
    void addTheme(Theme theme);

    void removeThemes(LinkedList<Theme> themes);
}

答案 1 :(得分:0)

好的,这是我的错。不是数据本身被破坏,而是我在数据上使用php substr方法将数据剪切到不幸的地方。在这里找到了一个解决方案:php substr() function with utf-8 leaves � marks at the end

$var = "Бензин Офиси А.С. также производит все типы жира и смазок и их побочных        продуктов в его смесительных установках нефти машинного масла в Деринце, Измите, Алиага и Измире. У Компании есть 3 885 станций технического обслуживания, включая сжиженный газ (ЛПГ) станции под фирменным знаком Петрогаз, приблизительно 5 000 дилеров, двух смазочных смесительных установок, 12 терминалов, и 26 единиц поставки аэропорта.";

$foo = mb_substr($var,0,142, "utf-8");