为什么curl charset不改变?

时间:2016-07-03 05:27:10

标签: php http curl simple-html-dom

我需要使用simpl_html_dom德语网站进行解析。我对德语变音符号有问题,因为utf-8不支持变音符号。我知道,如果将文本从UTF-8转换为UTF-16或ISO-8859-1问题已解决。我使用CURL获取内容页面。这个页面有charset ISO-8859-1。我尝试设置CURLOPT_ENCODING ISO-8859-1,但是Curl总是返回utf-8文本。我不知道是做什么的。    这种方法的代码。

public function testsec()
{
    require_once DIR_SYSTEM.'library'.DIRECTORY_SEPARATOR.'simpleHtml'.DIRECTORY_SEPARATOR.'simple_html_dom.php';
    $regexpSecond = "~Möglicherweise.*? Vielen Dank~su";        
    $headers = array(
        "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0",
        "Accept: text/plain",
        "Connection: keep-alive",
    );

    $fp  = fopen(DIR_ADMIN.'logCurl.txt','w+');
    $head = fopen(DIR_ADMIN.'headers.txt','w+');
    $curl = curl_init("http://test.site.com/bla-bla-bla");
    curl_setopt($curl, CURLOPT_RETURNTRANSFER,true);
    curl_setopt($curl, CURLOPT_ENCODING , "UTF-16");        
    curl_setopt($curl, CURLOPT_VERBOSE, 1);
    curl_setopt($curl, CURLOPT_STDERR, $fp);
    curl_setopt($curl, CURLOPT_HEADER ,$headers);
    curl_setopt($curl, CURLOPT_WRITEHEADER, $head);
    $result = curl_exec($curl);
    curl_close($curl);
    fclose($fp);
    fclose($head);
    $html = str_get_html($result);
    echo mb_detect_encoding($result); //utf-8

}

标题回复

HTTP/1.1 200 OK
Date: Sun, 03 Jul 2016 05:22:34 GMT
Server: Apache
Set-Cookie: JTLSHOP=c1qv3vafghmf3ih43g5m96epi4; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: max-age=1, private, must-revalidate
Pragma: no-cache
Vary: Accept-Encoding
X-Powered-By: PleskLin
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

1 个答案:

答案 0 :(得分:0)

UTF-8支持变音符号。

http://www.periodni.com/unicode_utf-8_encoding.html#german_special_characters

如果要转换字符集,请使用ICONV函数。