我想插入一些数据,这些数据来自谷歌翻译。例如: http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello
收到结果后,我想将其插入MySQL表中。所以我写了下面的代码:
$link = "http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=";
$server = "127.0.0.1";
$username = "AliAhmadi";
$password = "AliAhmadi";
$database = "AliAhmadi";
$conn = mysql_pconnect($server, $username, $password);
if (!$conn)
die("Bye Bye");
mysql_select_db($database, $conn);
mysql_set_charset('utf8',$conn);
$ch = curl_init();
$url = $link."hello";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$WebContent = curl_exec($ch);
$update_query = 'update `en_db` SET `meaning`="'.mysql_real_escape_string($WebContent).'" where `id`=1';
mysql_query($update_query,$conn);
mysql_close($conn);
Google发送了以下文本文件:
[[["سلام", "hello", "", ""]], [["interjection", ["سلام", "هالو", "الو"], [["سلام", ["hello", "hi", "aloha", "all hail"]], ["هالو", ["hallo", "hello", "halloo"]], ["الو", ["hello"]]]]], "en", , [["سلام", [5], 0, 0, 1000, 0, 1, 0]], [["hello", 4, , , ""], ["hello", 5, [["سلام", 1000, 0, 0], ["خوش", 0, 0, 0], ["میهمان گرامی", 0, 0, 0], ["خوش آمدید", 0, 0, 0], ["درود کاربر", 0, 0, 0]], [[0, 5]], "hello"]], , , [["en"]], 74]
但是在表格中只保存了字符串的第一部分:
[[["
我认为问题来自unicode,因为当我评论mysql_set_charset('utf8',$conn);
时,它会在表格中保存一些东西,但看起来像
[[["Èå","to","",""]],[["preposition",["Èå","ÈÑÇ\u06CC","ÏÑ","ÏÑ ÈÑÇÈÑ","\u06CCÔ","Óæ\u06CC","äÒÏ","ØÑÝ","ÈÓæ\u06CC","ÊÇ äÓÈÊ Èå","ÈÑ ÍÓÈ","ÈØÑÝ","ÑæÈØÑÝ"],[["Èå",["to","into","in","on","at","against"]],["ÈÑÇ\u06CC",["for","to","on","for the sake","toward","in order that"]],["ÏÑ",["at","to","about","unto"]],["ÏÑ ÈÑÇÈÑ",["against","versus","to","for","unto"]],["\u06CCÔ",["before","to","with","unto"]],["Óæ\u06CC",["to","unto"]],["äÒÏ",["to","near","about"]],["ØÑÝ",["towards","to"]],["ÈÓæ\u06CC",["toward","to","into","off","unto","at"]],["ÊÇ äÓÈÊ Èå",["to","unto"]],["ÈÑ ÍÓÈ",["according to","in","at","to"]],["ÈØÑÝ",["toward","at","unto","to","in","into"]],["ÑæÈØÑÝ",["unto","to"]]]],["",["ÚáÇãÊ ãÕÏÑ Çäá\u06CCÓ\u06CC ÇÓÊ"],[["ÚáÇãÊ ãÕÏÑ Çäá\u06CCÓ\u06CC ÇÓÊ",["to"]]]]],"en",,[["Èå",[5],0,0,1000,0,1,0]],[["to",4,,,""],["to",5,[["Èå",1000,0,0],["ÈÑÇ\u06CC",0,0,0],["ÊÇ",0,0,0],["ÑÇ Èå",0,0,0],["Èå ãäÙæÑ",0,0,0]],[[0,2]],"to"]],,,,5]
google翻译器返回的unicode是什么?这个代码我的问题在哪里?我在utf8_unicode_ci,utf8_general_ci和utf8_presian_ci之间更改了Collation,但这个问题又发生了。
答案 0 :(得分:2)
我认为您的en_db.meaning
列定义为默认排序规则latin1_swedish_ci
。这使用ISO-8859-1(Latin-1)编码,该编码无法存储阿拉伯字符。
(当您删除mysql_set_charset
调用时,MySQL会将您的UTF-8阿拉伯语误解为拉丁字符,这些字符适合该列,但看起来完全错误。)
确保在创建表时指定使用UTF-8的排序规则,例如CREATE TABLE en_db (...) COLLATE utf8_general_ci
或一般(...) CHARACTER SET utf8
(或utf8mb4
用于星体平面支持(如果可用))。
您可以使用ALTER TABLE en_db CONVERT TO CHARACTER SET utf8
更改现有表及其中所有文本列的排序规则,但如果您已经有非ASCII字符,则无论哪种方式都可能出错。
答案 1 :(得分:-2)
<?php
//Set Beginning of php code:
header("Content-Type: text/html; charset=UTF-8");
mysql_query("SET NAMES 'utf8'");
mysql_query('SET CHARACTER SET utf8');
//then create the connection
$CNN=mysql_connect("localhost","usr_urdu","123") or die('Unable to Connect');
$DB=mysql_select_db('db_urdu',$CNN)or die('Unable to select DB');