维基百科转储文件

时间:2013-10-07 13:02:32

标签: mysql wikimedia

我将Persian Wikipedia 2007的转储文件导入到我的本地mysql 5.6。似乎非拉丁文脚本中的用户名未正确保存。有什么方法可以解决吗?

 select DISTINCT rev_user_text from revision where rev_user_text  like '%?%';

+-------------------------------+
| rev_user_text                 |
+-------------------------------+
| 1?1?                          |
| ?                             |
| ? ?                           |
| ? ? ?                         |
| ? ????                        |
| ?. ?????????                  |
| ?.????                        |
| ?.???????                     |
| ?.????????                    |
| ??                            |
| ?? ??                         |
| ?? ?? ??                      |
| ?? ???                        |
| ?? ??? ???                    |
| ???                           |
| ??? 110                       |
| ??? ?                         |
| ??? ???                       |
| ??? ??? ( ?? ??? )            |
| ??? ??? ????? ???             |
| ??? ????                      |
| ??? ???? ???                  |
| ??? ???? ?????                |
| ??? ???? ???????              |
| ??? ?????                     |
| ??? ????? ???                 |
| ??? ????? ????                |
| ??? ????? ??????              |
| ??? ?????1984                 |
| ??? ??????                    |
| ??? ???????                   |
| ??? ??????? ???               |
| ??? ????????                  |
| ??? ??????????                |
| ???76                         |
| ????                          |
| ???? 32                       |
| ???? ?                        |
| ???? ??                       |
| ???? ?? ? ?????               |
| ???? ???                      |
| ???? ??? ? ????? ????         |
| ???? ??? ????                 |
| ???? ??? ?????                |
| ???? ??? ????? ?????          |
| ???? ????                     |
| ???? ???? ???                 |
| ???? ???? ??? (??????)        |
| ???? ???? ????                |
| ????.???                      |
| ????22                        |
| ????4183                      |
| ????777                       |
| ????808                       |
| ?????                         |
| ????? - ???? ???              |
| ????? 85 8                    |
| ????? ?                       |
| ????? ???                     |
| ????? ??? ???                 |
| ????? ??? ????                |
| ????? ????                    |
| ????? ???? (????? ????)       |
| ????? ???? --????? ????       |
| ????? ???? -????? ????        |
| ????? ???? ???                |
| ????? ???? ????               |
| ????? ???? ??????             |
| ????? ?????                   |
| ????? ????? ????              |
| ????? ????? ?????             |
| ????? ????? ????????          |
| ????? ??????                  |
| ????? ?????? ???              |
 …….

1 个答案:

答案 0 :(得分:1)

可能你没有使用合适的字符集,比如utf8。 尝试使用以下方法重新创建表:

CREATE TABLE revisions
(...)
CHARACTER SET 'utf8';

或更改现有表格的字符集:

ALTER TABLE revisions
CHARACTER SET 'utf8';