Question

我正在抓取网站（this one）中的链接，但网站的结构会产生不必要的额外输出。基本上，<a>标签具有文章的名称和其中的附加信息（图像和图像的来源）。我想获取更多信息。我发现:not Selector要做到这一点，但我想我实施的是错误的，因为我尝试的每一个组合都没有输出任何输出。

Here is the output

以下是我需要修改的代码：

$posts = $html->find('ul[class=river] a[data-omni-click=inherit] :not[figure]');

（我还尝试了figure:not和其他几种组合）

有谁知道我哪里出错了，以及我要做些什么来排除<figure>标签？

以下是我的完整代码，不确定是否有帮助：

<div class='rcorners1'>
 <?php
include_once('simple_html_dom.php');

$target_url = "http://www.theatlantic.com/most-popular/";

$html = new simple_html_dom();

$html->load_file($target_url);

$posts = $html->find('ul[class=river] a[data-omni-click=inherit] :not[figure]');
$limit = 10;
$limit = count($posts) < $limit ? count($posts) : $limit;
for($i=0; $i < $limit; $i++){
  $post = $posts[$i];
  $post->href = 'http://www.theatlantic.com'.$post->href;
  echo strip_tags($post, '<p><a>'); //echo ($post); 

}
?>
</div>
</div>

：不是CSS选择器实现问题

0 个答案: