想尝试从taobao
网站抓取数据。
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
</head>
<body>
<?php
include_once('simple_html_dom.php');
$target_url = "http://item.taobao.com/item.htm?spm=a2106.m893.1000384.54.61Q4Fp&id=37676614376&_u=fm86qe4d813&scm=1029.newlist-0.1.50006843&ppath=&sku=&ug=#detail";
$html = new simple_html_dom();
$html->load_file($target_url);
foreach ($html->find('h3[class=tb-main-title]') as $post) {
echo html_entity_decode($post, ENT_QUOTES, "ISO-8859-1") . "<br />";
}
?>
</body>
</html>
但它显示了产品标题:
2014��ЬŮʿ�������¿��ϸ��ƽ���ļ��¿����ϴ���ƽ����Ь��
答案 0 :(得分:0)
为了避免这种情况,您需要使用iconv
功能。考虑这个例子:
include 'simple_html_dom.php';
$target_url = "http://item.taobao.com/item.htm?spm=a2106.m893.1000384.54.61Q4Fp&id=37676614376&_u=fm86qe4d813&scm=1029.newlist-0.1.50006843&ppath=&sku=&ug=#detail";
$contents = file_get_contents($target_url);
$html = str_get_html($contents);
foreach($html->find('h3[class=tb-main-title]') as $post) {
$text = $post->innertext;
$text = iconv('gb2312', 'utf-8', $text);
echo $text;
// 2014拖鞋女士人字拖新款豹纹细带平底夏季新款凉拖大码平底拖鞋潮
}