从外部站点检索div的内容

时间:2018-02-12 23:54:55

标签: php wordpress xpath

尝试使用PHP和XPath

从外部站点检索div的内容

这是摘录自页面,显示相关代码: 注意:我尝试添加所有 - 也在类上添加@,在我的查询末尾添加@,之后,我使用saveHTML()来获取它。看我的测试:

btw:

this is my XPath:  //*[@id="post-15991"]/div[4]/div[1]
this is the URL: https://wordpress.org/plugins/wp-job-manager/

查看后续代码:

<?PHP
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
@$dom->loadHTMLFile($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;
?>

输出但输出为零......

背景:

我获取xpath的方式;使用谷歌浏览器:我有一个网页,我希望得到一些数据:

https://wordpress.org/plugins/wp-job-manager/
https://wordpress.org/plugins/participants-database/
https://wordpress.org/plugins/amazon-link/
https://wordpress.org/plugins/simple-membership/
https://wordpress.org/plugins/scrapeazon/

目标:我需要以下数据:

Version:
Last updated:
Active installations:
Tested up

请参阅以下示例 - view-source:https://wordpress.org/plugins/wp-job-manager/

  • 版本: 1.29.3
  •         
  •             上次更新时间: 5天前
  •         
  • 有效安装: 100,000 +
  •                     <li>
            Requires WordPress Version:<strong>4.3.1</strong>                </li>
    
                        <li>Tested up to: <strong>4.9.2</strong></li>
    

    背景:我需要所有我喜欢的插件中的数据 - 想要在db或calc表中使用它。所以有大约70页要刮:_

    在这里看到示例列表 - 完整的xpath:

    //*[@id="post-15991"]/div[4]/div[1]
    

    和 作业板管理器:

    //*[@id="post-519"]/div[4]/div[1]/ul/li[1]
    //*[@id="post-519"]/div[4]/div[1]/ul/li[2]
    //*[@id="post-519"]/div[4]/div[1]/ul/li[3]
    //*[@id="post-519"]/div[4]/div[1]/ul/li[7]
    

    我使用了这种方法:Is there a way to get the xpath in google chrome?

    Right click "inspect" on the item you are trying to find the xpath
    Right click on the highlighted area on the console.
    Go to Copy xpath
    

    1 个答案:

    答案 0 :(得分:1)

    您正在调用期望文件路径的.loadHTMLFile。如果您有警告选项,您将看到以下警告:

      

    E_WARNING:类型2 - DOMDocument :: loadHTMLFile():在https://wordpress.org/plugins/wp-job-manager/中重新定义的属性类,行:190 - 第5行

         

    E_WARNING:类型2 - DOMDocument :: loadHTMLFile():标题标题在https://wordpress.org/plugins/wp-job-manager/中无效,第201行 - 第5行

         

    E_WARNING:类型2 - DOMDocument :: loadHTMLFile():标记导航在https://wordpress.org/plugins/wp-job-manager/中无效,行:205 - 在第5行

         

    E_WARNING:类型2 - DOMDocument :: loadHTMLFile():在https://wordpress.org/plugins/wp-job-manager/中标记主要无效,第224行 - 第5行

    相反,请使用.loadHTML

    $url = 'https://wordpress.org/plugins/wp-job-manager/';
    $dom = new DOMDocument();
    @$dom->loadHTML($url);
    $xpath = new DOMXpath($dom);
    $elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
    $link = $dom->saveHTML($elements->item(0));
    echo $link;
    

    结果将是:

    https://wordpress.org/plugins/wp-job-manager/