html页面:
<div class="title-download">
<div id="ctl " class="title">
<h3>
<a id="ct2" href="http://url1.com">title</a>
<span id="ct3" class="citation">(<a id="ct4 " href=" ">Citations</a>)</span></h3>
</div>
<div id="ct4" class="download">
<a id="ct5 " title=" " href="http://url.pdf" img id="ct6" class="small-icon" src=" " /></a>
</div>
</div>
<div class="content">
<a class="author " href="author.com">author</a><span class="span-break" >, </span><a class="author2.com " href="http://author2.com">author2</a>
</div>
如果只有类下载包含pdf网址,我想获得http://url1.com
,title
,http://url.pdf
,author.com
和author
。
herre是代码:
foreach($html->find('span[class=citation]') as $link1){
foreach($link1->parent()->parent()->parent()->find('.download a') as $link2){
foreach ($link1->parent()->find('div[class=content] a') as $a ){
if(strtolower(substr($link2->title, strrpos($link2->href, '.'))) === '.pdf') {
$link1 = $link1->prev_sibling();
$a = $link1->next_sibling();
$title = strip_tags($link1->plaintext);
$linkWeb = strip_tags($link1->href);
$author= strip_tags($a->plaintext);
$linkAuthor= strip_tags($a->href);
$pdfLink = strip_tags($link2->title);
}
}
}
}
我得到了空白的结果,请你帮帮我,请告诉我错误。提前谢谢:)
答案 0 :(得分:1)
由于页面填充了带有类标题下载的div,因此您应该能够按如下方式重写循环:
foreach( $html->find('div[class=title-download]') as $div){
$dowloadlink = $div->find('div[class=download] a', 0);
if($dowloadlink != null){
if(strtolower(substr($downloadlink->href, strrpos($downloadlink->href, '.'))) === '.pdf'){
$content = $div->find('div[class=content] h3 a', 0);
$title = strip_tags($content->plaintext);
$linkWeb = strip_tags($content->href);
$authorlink = $div->next_sibling().find('a', 0);
$author = strip_tags($authorlink->plaintext);
$linkAuthor= strip_tags($authorlink->href);
$pdfLink = strip_tags($downloadlink->href);
}
}
}
答案 1 :(得分:0)
您是否尝试过添加print语句来尝试和调试?快速浏览表明第三个循环,你有:
foreach ($link1->parent()->parent()->find('div[class=content] a') as $a) {
不会匹配任何东西,因为你不会回去得足够远(看起来它会在#ctl div?)。一旦你上升了三个级别,你真的想要寻找一个兄弟元素吗?