与使用PHP的实际HTML相比,Scraped HTML放错了标签?

时间:2016-10-05 11:44:16

标签: javascript php html curl

我在抓this webpage。这是我的代码:

$parent_tag = "//div[contains(@class, 'post-data')]";

$all_html  = $xpath->query($parent_tag)->item(0)->ownerDocument->saveHTML($xpath->query($parent_tag)->item(0));

echo $all_html;

问题是最终返回的代码具有乱码格式的脚本标记。这是网页的原始代码:

<div id="sponsored-category-header" class="page-header sponsored-category-header clear"> <script type="text/javascript">jQuery(document).ready(function($) {
        var cat_head_params = {"sponsor":"SEO PowerSuite","sponsor_logo":"https:\/\/www.searchenginejournal.com\/wp-content\/plugins\/abm-sej\/includes\/category-images\/SPS_128.png","sponsor_text":"<div class=\"taxonomy-description\">Dominate Google local search results with ease! Get your copy of SEO PowerSuite and keep <a rel=\"nofollow\" href=\"http:\/\/sejr.nl\/PowerSuite-2016-5\" onClick=\"__gaTracker('send', 'event', 'Sponsored Category Click Var 1', 'Local Search', 'SEO PowerSuite');\" target=\"_blank\">your local SEO strategy<\/a> up to par.<\/div>","logo_url":"http:\/\/sejr.nl\/PowerSuite-2016-5","ga_labels":["Local Search","SEO PowerSuite"]}            
        $('#sponsored-category-header').append('<div class="sponsored-category-logo"></div>');
                     $('#sponsored-category-header .sponsored-category-logo').append(' <a rel="nofollow" href="'+cat_head_params.logo_url+'" onClick="__gaTracker(\'send\', \'event\', \'Sponsored Category Click Var 1\', \''+cat_head_params.ga_labels[0]+'\', \''+cat_head_params.ga_labels[0]+'\');" target="_blank"><img class="nopin" src="'+cat_head_params.sponsor_logo+'" width="96" height="96" /></a>');
                                   $('#sponsored-category-header').append('<div class="sponsored-category-details"></div>');
         $('#sponsored-category-header .sponsored-category-details').append('<h3 class="page-title sponsored-category-title">'+cat_head_params.sponsor+'</h3>');
         $('#sponsored-category-header .sponsored-category-details').append(cat_head_params.sponsor_text);


    });</script> </div>

这就是我得到的:

<script type="text/javascript">jQuery(document).ready(function($) {
        var cat_head_params = {"sponsor":"SEO PowerSuite","sponsor_logo":"https:\/\/www.searchenginejournal.com\/wp-content\/plugins\/abm-sej\/includes\/category-images\/SPS_128.png","sponsor_text":"<div class=\"taxonomy-description\">Dominate Google local search results with ease! Get your copy of SEO PowerSuite and keep <a rel=\"nofollow\" href=\"http:\/\/sejr.nl\/PowerSuite-2016-5\" onClick=\"__gaTracker('send', 'event', 'Sponsored Category Click Var 1', 'Local Search', 'SEO PowerSuite');\" target=\"_blank\">your local SEO strategy<\/a> up to par.<\/div>","logo_url":"http:\/\/sejr.nl\/PowerSuite-2016-5","ga_labels":["Local Search","SEO PowerSuite"]}            
        $('#sponsored-category-header').append('<div class="sponsored-category-logo"></script>

</div>');
                     $('#sponsored-category-header .sponsored-category-logo').append(' <a rel="nofollow" href="'+cat_head_params.logo_url+'" onclick="__gaTracker(\'send\', \'event\', \'Sponsored Category Click Var 1\', \''+cat_head_params.ga_labels[0]+'\', \''+cat_head_params.ga_labels[0]+'\');" target="_blank"><img class="nopin" src="'+cat_head_params.sponsor_logo+'" width="96" height="96"></a>');
                                   $('#sponsored-category-header').append('<div class="sponsored-category-details"></div>');
         $('#sponsored-category-header .sponsored-category-details').append('<h3 class="page-title sponsored-category-title">'+cat_head_params.sponsor+'</h3>');
         $('#sponsored-category-header .sponsored-category-details').append(cat_head_params.sponsor_text);


    }); </div>

结果script标记从原始位置移动到脚本实际结束之前的新位置。我不知道为什么会这样,它只发生在这个网站上(也许是因为脚本标签内的代码,但我不确定)。任何帮助将不胜感激。

0 个答案:

没有答案