我在php中使用goutte
来获取页面的html。我使用jquery ajax
调用php,然后将页面放在doc-area (#doc)
中。
我想将该页面放在没有特殊字符的位置,例如
和其他字符,但我的clean()
函数不起作用。我该如何解决?
PHP:
<?php
require_once 'goutte.phar';
use Goutte\Client;
if(isset($_GET['url'])) {
$url = $_GET['url'];
}
//client used to send requests to a website and returns a crawler object
$client = new Client();
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYHOST, FALSE); //codice per accettare anche https
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYPEER, FALSE);
$crawler = $client->request('GET', $url);
if($status_code==200){
$result = $crawler->filterXPath('html/body')->html();
$result=clean($result);
echo $result;
}
else {
//in case of error
echo "HTTP/1.0 400 Bad Request";
}
function clean($conv) {
$string = htmlentities($conv, null, 'utf-8');
$conv = str_replace(" ", "", $string);
$conv = html_entity_decode($conv);
return($conv);
}
?>
JAVASCRIPT:
function visual(search) {
$.ajax({
type: "GET",
url: "goutte.php?url="+search,
success: function(data)
{
var content=$.parseHTML(data);
$("#doc").html(contenuto);
},
//azione in caso di errore
error: function()
{
alert("Error");
}
});
}
答案 0 :(得分:3)
如果要将已编码的html解码回常规,则需要使用htmlentities
。这就是你需要做的。在html编码的字符串上再次使用clean
是错误的,并且使用str_replace。
因此,您的function clean($conv) {
$conv = html_entity_decode($conv, NULL, "UTF-8"); //To 'force' UTF-8 charset (php.ini settings may differ, that's why!)
return $conv;
}
函数只应解码html编码的字符串。
{{1}}