我正在尝试获取我正在解析的网页中的节点内容。这是我的代码:
include('simplehtmldom_1_5/simple_html_dom.php');
// get DOM from URL or file
$feedUrl = "http://www.yellowpages.com/md/cpa-tax?menu_search=false&page=1&refinements%5Bfacet_clicked%5D=HeadingText&refinements%5Bheadingtext%5D%5B%5D=Accountants-Certified+Public&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation-Business";
$html = file_get_html($feedUrl);
$xpath = "/html/body/div[5]/div[1]/div[1]/div[1]/div[5]/div[3]/div[1]/div[1]/div[1]/div[1]/a[1]/div[1]/div[1]/div[3]/div[1]/div[2]/h3[1]/div[1]/a[1]";
foreach($html->find($xpath) as $e)
echo $e->title . '<br>';
在此示例中,我试图从网页上获取“Tax Experience CPA,Inc”的名称。问题是find($ xpath)返回的数组总是为空。当我打开谷歌浏览器并搜索具有该xpath的节点时,我能够找到我想要的节点。但这不适用于我的代码。我正在使用的路径一定存在问题,但我无法弄清楚它是什么。我搜索过但搜索过但我找不到我做错了什么。 请帮忙。
答案 0 :(得分:1)
网站上有很多带有id和类的节点,用它们来创建一个更简单的简单xpath表达式来检索你想要的东西!
以下是适合您的工作代码:
// includes Simple HTML DOM Parser
include "simple_html_dom.php";
$feedUrl = "http://www.yellowpages.com/md/cpa-tax?menu_search=false&page=1&refinements%5Bfacet_clicked%5D=HeadingText&refinements%5Bheadingtext%5D%5B%5D=Accountants-Certified+Public&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation-Business";
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load_file($feedUrl);
// Find all anchors
$anchors = $html->find("//div[@class='srp-business-name']/a");
// Display all titles
foreach($anchors as $a)
echo $a->title . '<br>';
<强>输出强>
Tax Experience CPA Inc
Bernice Hassan CPA Accounting & Tax Services
Begosh Tax Service CPA
At-Home CPA Tax Service
CPA Financial & Tax Service
My Tax CPA
...
这是一个修改过的代码,用于从每个“element / div”中获取标题和电话号码。
请注意find("...", $index)
返回$index
指定的一个元素(从0开始的第N个元素),如果未设置$index
,则返回元素数组...
$feedUrl = "http://www.yellowpages.com/md/cpa-tax?menu_search=false&page=1&refinements%5Bfacet_clicked%5D=HeadingText&refinements%5Bheadingtext%5D%5B%5D=Accountants-Certified+Public&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation-Business";
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load_file($feedUrl);
// Find all elements
$divs = $html->find('div.business-container-inner');
// loop through all elements and display the useful parts
foreach($divs as $div) {
$title = $div->find('div.srp-business-name a', 0)->title;
$phone = $div->find('span.business-phone', 0)->plaintext;
echo $title ." - ". $phone . "<br>";
}
// Clear DOM object
$html->clear();
unset($html);
答案 1 :(得分:0)
我想,你应该试试这个。
include('simplehtmldom_1_5/simple_html_dom.php');
// get DOM from URL or file
$feedUrl = "http://www.yellowpages.com/md/cpa-tax?menu_search=false&page=1&refinements%5Bfacet_clicked%5D=HeadingText&refinements%5Bheadingtext%5D%5B%5D=Accountants-Certified+Public&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation-Business";
$html = new simple_html_dom();
$html->load_file($feedUrl);
$xpath = ".srp-business-name a";
foreach($html->find($xpath) as $e)
echo $e->title . '<br>';