Question

我正在使用simplehtmldom从网站上获取html。然后我在页面上搜索所有div 显示字数大于300的innertext。为此，我用foreach迭代。

$findDivs = $html->find('div');

foreach($findDivs as $findDiv) {
  $wordCount = explode(' ', $findDiv->outertext);
  $wordCount = count($wordCount);
  if($wordCount <= 300) {
    $findDiv->outertext = '';
   }
   else {
     echo $findDiv->outertext . '<br />';
  }
}

我遇到的问题是结果重复了6次。我只能假设这是因为每次迭代都会循环所有的div。但是，我不确定我可以使用什么技术来确保每个div只评估一次。

Answer 1

您想要innertext，但您的代码状态为outertext - 我认为这是重复的原因。

foreach($html->find('div') as $findDiv) {
  $wordCount = explode(' ', $findDiv->innertext);
  $wordCount = count($wordCount);
  if($wordCount > 300) {
    echo $findDiv->outertext . '<br />';
   }
}

Answer 2

我不知道为什么，但这解决了我的问题。

我在$ html-＆gt; find（'div'，1）中添加了'1'参数;

所以工作代码如下：

$findDivs = $html->find('div',1);  //add a 1 to the divs. this works as the script now only loops once.

foreach($findDivs as $findDiv) {
  $wordCount = explode(' ', $findDiv->outertext);
  $wordCount = count($wordCount);
  if($wordCount <= 300) {
    $findDiv->outertext = '';
   }
   else {
     echo $findDiv->outertext . '<br />';
  }
}

循环遍历div并使用simplehtmldom提取文本

2 个答案: