我正在从网页抓取一些网址,并且它在页面上显示正常,但是当我将网址插入数据库时,它插入了一些像这样的奇怪
http://westseattleblog.com/event/west-seattle-church-listings/?instance_id=567059
我的代码
foreach($html->find('div[class=ai1ec-btn-group ai1ec-actions] a') as $element)
{
$url= $element->href;
$url1=mysql_real_escape_string($url);
$sql="insert into catlink(catlink) values('$url1')";
//echo $sql."<br>";
$query=mysql_query($sql);
//newpage
}
当我开始从数据库中提取url并逐个删除时,它什么也没显示。
我的代码
$sql1="select * from links limit 10";
$query1=mysql_query($sql1);
while($res=mysql_fetch_assoc($query1)){
$url=$res['url'];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
// curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$page = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$html = $dom->load($page);
foreach($html->find("div") as $a){
echo $a->innertext;
}
//$separator = ' - ';
}
答案 0 :(得分:0)
您的网址为hex characters,因此您需要使用html_entity_decode
在将其插入数据库之前或在将其与cURL一起使用之前对其进行解码
所以:
$url1=mysql_real_escape_string(html_entity_decode($url));
或
$url=html_entity_decode($res['url']);