我有一个表,该表的列的数据似乎不是UTF8。我想将该列转换为UTF8。
我发现了这个精彩的教程:https://coderwall.com/p/gjyuwg/mysql-convert-encoding-to-utf8-without-garbled-data
但是,这些解决方案都无法真正起作用。
当我这样做
UPDATE vbpmtext
SET message = @txt
WHERE char_length(message) = LENGTH(@txt := CONVERT(BINARY CONVERT(message USING latin1) USING utf8));
我收到很多这样的错误:
Invalid utf8 character string: 'FC6265'
具有不同的“字符串”(FC6265只是一个示例)。
有什么办法可以挽救这些数据?
我们要说的列自然采用latin1_german1_ci
归类的格式。
答案 0 :(得分:1)
解释为latin1的übe
是@txt
。 (用于cp1250,cp1256,cp1257,dec8,latin2,latin5,latin7的同上。)
übe
是3个字符的字符串FC6265
吗?还是6个字符的字符串mysql> SET @in := UNHEX('FC6265');
mysql> SELECT HEX(@in);
+----------+
| HEX(@in) |
+----------+
| FC6265 |
+----------+
mysql> SELECT HEX( CONVERT(@in USING latin1) );
+----------------------------------+
| HEX( CONVERT(@in USING latin1) ) |
+----------------------------------+
| FC6265 |
+----------------------------------+
mysql> SELECT HEX( BINARY(CONVERT(@in USING latin1)) );
+------------------------------------------+
| HEX( BINARY(CONVERT(@in USING latin1)) ) |
+------------------------------------------+
| FC6265 |
+------------------------------------------+
mysql> SELECT HEX( CONVERT(BINARY CONVERT(@in USING latin1) USING utf8) );
+-------------------------------------------------------------+
| HEX( CONVERT(BINARY CONVERT(@in USING latin1) USING utf8) ) |
+-------------------------------------------------------------+
| |
+-------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+---------+------+-----------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------+
| Warning | 1300 | Invalid utf8 character string: 'FC6265' |
+---------+------+-----------------------------------------+
?
BINARY()
使用utf8
可以消除对当前字符串进行编码的任何假设。因此,它采用最简单的方法,并假定字符串已经为mysql> SELECT CONVERT(CONVERT(@in USING latin1) USING utf8);
+-----------------------------------------------+
| CONVERT(CONVERT(@in USING latin1) USING utf8) |
+-----------------------------------------------+
| übe |
+-----------------------------------------------+
。
这可能是最短的方法:
mysql> CREATE TABLE ube ( c VARCHAR(8) CHARSET latin1 COLLATE latin1_german1_ci );
mysql> INSERT INTO ube (c) VALUES (UNHEX('FC6265'));
mysql> SELECT HEX(c) FROM ube;
+--------+
| HEX(c) |
+--------+
| FC6265 | -- Note the latin1 encoding
+--------+
mysql> ALTER TABLE ube CONVERT TO CHARACTER SET utf8mb4;
Query OK, 1 row affected (0.04 sec)
Records: 1 Duplicates: 0 Warnings: 0 -- Note: no errors
mysql> SELECT HEX(c) FROM ube;
+----------+
| HEX(c) |
+----------+
| C3BC6265 | -- Now utf8mb4 encoding
+----------+
但是……该列的字符集是什么?如果是latin1,则说明情况更糟。在没有进行进一步测试之前,请不要进行任何更改。 Here有几种情况,每种情况都有解决方法。在弄清楚是否有这种情况之前,不要急于解决。您可能会使情况变得更糟。另请参见Trouble with UTF-8 characters; what I see is not what I stored
示例
public class MyDialog extends AppCompatDialogFragment {
public static final String DIALOG_TAG = "Dialog window";
private MyDialogListener listener;
private DialogOption dialogOption;
public MyDialog(DialogOption dialogOption) {
this.dialogOption = dialogOption;
}
@NonNull
@Override
public android.app.Dialog onCreateDialog(@Nullable Bundle savedInstanceState) {
AlertDialog.Builder builder = new AlertDialog.Builder(getActivity());
LayoutInflater inflater = getActivity().getLayoutInflater();
View view;
switch (dialogOption) {
case DIALOG_ADD:
view = inflater.inflate(R.layout.dialog_addtheme, null);
final EditText editTextThemeName = view.findViewById(R.id.editText_themeName);
builder.setView(view)
.setTitle("Add Theme")
.setNegativeButton("Cancel", new DialogInterface.OnClickListener() {
@Override
public void onClick(DialogInterface dialogInterface, int i) {
}
})
.setPositiveButton("Add", new DialogInterface.OnClickListener() {
@Override
public void onClick(DialogInterface dialogInterface, int i) {
String stringThemeName = editTextThemeName.getText().toString();
listener.addTheme(new Theme(stringThemeName));
}
});
break;
case DIALOG_DELETE:
view = inflater.inflate(R.layout.dialog_deletetheme, null);
builder.setView(view)
.setTitle("Delete themes")
.setNegativeButton("Cancel", new DialogInterface.OnClickListener() {
@Override
public void onClick(DialogInterface dialogInterface, int i) {
}
})
.setPositiveButton("Delete", new DialogInterface.OnClickListener() {
@Override
public void onClick(DialogInterface dialogInterface, int i) {
LinkedList<Theme> toDelete = (LinkedList<Theme>) getArguments().getSerializable("To delete");
listener.removeThemes(toDelete);
}
});
break;
case DIALOG_EDIT:
}
return builder.create();
}
@Override
public void onAttach(@NonNull Context context) {
super.onAttach(context);
listener = (MyDialogListener) context;
}
public interface MyDialogListener {
void addTheme(Theme theme);
void removeThemes(LinkedList<Theme> themes);
}
答案 1 :(得分:0)
好的,这是我的错。不是数据本身被破坏,而是我在数据上使用php substr方法将数据剪切到不幸的地方。在这里找到了一个解决方案:php substr() function with utf-8 leaves � marks at the end
$var = "Бензин Офиси А.С. также производит все типы жира и смазок и их побочных продуктов в его смесительных установках нефти машинного масла в Деринце, Измите, Алиага и Измире. У Компании есть 3 885 станций технического обслуживания, включая сжиженный газ (ЛПГ) станции под фирменным знаком Петрогаз, приблизительно 5 000 дилеров, двух смазочных смесительных установок, 12 терминалов, и 26 единиц поставки аэропорта.";
$foo = mb_substr($var,0,142, "utf-8");