我试图从维基百科文章中获取简短的摘录。在我的浏览器中使用以下网址: http://en.wikipedia.org//w/api.php?action=query&prop=extracts&format=txt&exsentences=2&exlimit=10&exintro=&explaintext=&iwurl=&titles=Greek%20language
我在浏览器中收到以下结果:
Array
(
[query] => Array
(
[pages] => Array
(
[11887] => Array
(
[pageid] => 11887
[ns] => 0
[title] => Greek language
[extract] => Greek (Modern Greek: ελληνικά [eliniˈka] "Greek" and ελληνική γλώσσα [eliniˈci ˈɣlosa] ( ) "Greek language") is an independent branch of the Indo-European family of languages. Native to the southern Balkans, western Asia Minor, Greece, the Aegean Islands, and Cyprus it has the longest documented history of any Indo-European language, spanning 34 centuries of written records.
)
)
)
)
哪个好。
问题是,当我使用相同的网址试图用PHP服务器端用CURL抓住它时,外国字母显示为乱码。以下是我尝试这样做的方法:
$url = 'http://en.wikipedia.org//w/api.php?action=query&prop=extracts&format=txt&exsentences=2&exlimit=10&exintro=&explaintext=&iwurl=&titles=Greek%20language';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript");
$c = curl_exec($ch);
echo $c;
给了我以下结果:
Array ( [query] => Array ( [pages] => Array ( [11887] => Array ( [pageid] => 11887 [ns] => 0 [title] => Greek language [extract] => Greek (Modern Greek: ελληνικά [eliniˈka] "Greek" and ελληνική γλώσσα [eliniˈci ˈɣlosa] ( ) "Greek language") is an independent branch of the Indo-European family of languages. Native to the southern Balkans, western Asia Minor, Greece, the Aegean Islands, and Cyprus it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. ) ) ) )
但外语是胡言乱语。我和其他有关外语的文章得到了相同的结果。如何正确接收和出示外国字母?
答案 0 :(得分:1)
您需要设置header
<?php
header('Content-Type: text/html;charset=utf-8'); //<--- Add this
这是因为这些字符是Unicode格式,因此您需要隐式设置标题以反映字符集。