simple_html_dom中的innertext

时间:2017-10-16 15:43:20

标签: php html web-crawler

Why do inner text is not active

Here is HTML code

[这是HTML代码]

<ul class="product">
<li class="product col-md-4 col-sm-4 col-xs-6 "><div class="product-header">
<a href="/so-mi-octopus-xanh-soc-trang-p5163098.html">
<img src="//cdn.nhanh.vn/cdn/store/17863/ps/20170925/0ctopus_thumb_450x600.jpg" class="attachment-shop_catalog size-shop_catalog wp-post-image">
</a><div class="buttons">
<a href="/so-mi-octopus-xanh-soc-trang-p5163098.html" rel="nofollow" class="button add_to_cart_button">
<i class="fa fa-shopping-bag" aria-hidden="true"></i>
<span class="screen-reader-text">Thêm vào giỏ</span></a>
<a data-product_id="5163098" class="button btnFav" rel="nofollow">
<i class="fa fa-heart-o" aria-hidden="true"></i>
<span class="screen-reader-text">Yêu thích</span>
</a></div></div><h3><a href="/so-mi-octopus-xanh-soc-trang-p5163098.html">Sơ mi Octopus xanh sọc trắng</a></h3><span class="price">
<span class="woocommerce-Price-amount amount">
400,000 ₫                            </span>
</span></li>
</ul>

[这是我的代码]

<?php
require "simple_html_dom.php";
$html=file_get_html("http://zuhaus.vn/zu-design-pc150502.html?page=1");
$ds=$html->find("ul.products li");
foreach ($ds as $sp) {
    # code...
    $price=$sp->find("span.price span",0);
    echo $price;
    $name=$sp->find("h3 a",1)->innertext;
    echo $name;
}
?>

我尝试了很多测试用例,但它无法正常工作:&#34;&lt; 谢谢 P / s我使用了库simple_html_dom

2 个答案:

答案 0 :(得分:1)

问题在于您的选择器,我更改了产品名称的选择器,它只是起作用,也使用curl来提高爬行速度

<?php
require "simple_html_dom.php";



$ch = curl_init("http://zuhaus.vn/zu-design-pc150502.html?page=1");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);

$html = new simple_html_dom();
$html->load($curl_scraped_page);

$ds=$html->find("ul.products li");
foreach ($ds as $sp) {
    # code...
    $price=$sp->find("span.price span",0);
    echo $price;
    $name=$sp->find("a",3)->innertext; // this is where the problem on your code
    echo $name;

echo "</br>";
}
?>

答案 1 :(得分:0)

如果你想获取标签的内容并包含在h3中,那么你在行中会出现语法错误

{
  "nodes": [
    {"id": "0"},
    {"id": "1"},
    {"id": "2"},
    {"id": "3"},
    {"id": "4"},
    ...
  ],
  "links": [
    {"source": "0", "target": "1", "value": 1},
    {"source": "0", "target": "2", "value": 8},
    {"source": "0", "target": "3", "value": 10},
    {"source": "0", "target": "9", "value": 6},
    {"source": "0", "target": "27", "value": 1},
    ...
  ]
}

我建议检查以下

的语法
$name = $sp->find("h3 a", 1)->innertext;