Question

我在php中使用goutte来获取页面的html。我使用jquery ajax调用php，然后将页面放在doc-area (#doc)中。

我想将该页面放在没有特殊字符的位置，例如 和其他字符，但我的clean()函数不起作用。我该如何解决？

PHP：

<?php
require_once 'goutte.phar';
use Goutte\Client;

if(isset($_GET['url'])) {
  $url = $_GET['url'];
}
//client used to send requests to a website and returns a crawler object
$client = new Client();
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYHOST, FALSE); //codice per accettare anche https
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYPEER, FALSE);
$crawler = $client->request('GET', $url);
if($status_code==200){



        $result = $crawler->filterXPath('html/body')->html(); 
        $result=clean($result);
        echo $result;

}
else {
    //in case of error
    echo "HTTP/1.0 400 Bad Request";
}

function clean($conv) {
    $string = htmlentities($conv, null, 'utf-8');
    $conv = str_replace("&nbsp;", "", $string);
    $conv = html_entity_decode($conv);
    return($conv);
}

?>

JAVASCRIPT：

function visual(search) {


    $.ajax({
            type: "GET",
            url: "goutte.php?url="+search,
            success: function(data)
            {
                var content=$.parseHTML(data);
                $("#doc").html(contenuto);

            },
            //azione in caso di errore
            error: function()
            {
                alert("Error");
            }
        });
}

Answer 1

如果要将已编码的html解码回常规，则需要使用htmlentities。这就是你需要做的。在html编码的字符串上再次使用clean是错误的，并且使用str_replace。

因此，您的function clean($conv) { $conv = html_entity_decode($conv, NULL, "UTF-8"); //To 'force' UTF-8 charset (php.ini settings may differ, that's why!) return $conv; }函数只应解码html编码的字符串。

{{1}}

http://php.net/html_entity_decode

使用php清除html中的特殊字符

1 个答案: