Question

我正在研究抓取网站技术，例如link，它总是返回空白以供描述。原因是JS使用以下代码填充了该代码，如何处理这些Senario。

// Frontend JS
P.when('DynamicIframe').execute(function(DynamicIframe){
    var BookDescriptionIframe = null,
        bookDescEncodedData = "book desc data",
        bookDescriptionAvailableHeight,
        minBookDescriptionInitialHeight = 112,
        options = {},
        iframeId = "bookDesc_iframe";

我正在如下使用php domxpath

    $file = 'sample.html';
    $dom = new DOMDocument();
    $dom->preserveWhiteSpace = false;
    // I am saving the returned html to a file and reading the file.
    @$dom->loadHTMLFile($file);
    $xpath = new DOMXPath($dom);

    // This xpath works on chrome console, but not here
    // because the content is dynamically created via js
    $desc  = $xpath->query('//*[@id="bookDesc_iframe"]')

Answer 1

每次您看到这类JavaScript生成的内容时，尤其是来自亚马逊，谷歌之类的大人物时，您都应该立即认为它会有一个优美的降级实现。

这意味着可以在Javascript无法像links浏览器那样工作的地方完成，以更好地覆盖浏览器。

寻找<noscript>，您可能会发现一个。这样您就可以解决问题。

xpath从iframe domXPath php获取数据

1 个答案: