用于在许多url-timeout问题的源代码中查找字符串的代码

时间:2011-04-13 04:08:21

标签: php arrays curl timeout web-scraping

我想输入一个很长的网址列表,并在源代码中搜索特定字符串,输出包含字符串的网址列表。听起来很简单吧?我有波纹管代码,您可以在pelican-cement.com/findfrog尝试。

问题是每次我搜索超过10个网址时都会超时

<html>
<body>

<form action="search.php" method="post"> 
  URLs: <br/>
  <textarea rows="20" cols="50" input type="text" name="url" /></textarea><br/>

  Search Term: <br/>
  <textarea rows="20" cols="50" input type="text" name="proxy" /></textarea><br/>

  <input type="submit" /> 
</form>

<?
set_time_limit (0);
  if(isset($_POST['url'])) {


    $urls = explode("\n", $_POST['url']);
    $term = $_POST['proxy'];
    $options = array( CURLOPT_FOLLOWLOCATION => 1,
                      CURLOPT_RETURNTRANSFER => 1,
                      CURLOPT_CUSTOMREQUEST  => 'GET',
                      CURLOPT_HEADER         => 1,
                      );
    $ch = curl_init();
    curl_setopt_array($ch, $options);

    foreach ($urls as $url) {
      curl_setopt($ch, CURLOPT_URL, trim($url));
      $html = curl_exec($ch);

      if ($html !== FALSE && stristr($html, $term) !== FALSE) { // Found!
        echo $url;
        echo "<br>";
      }
    }

    curl_close($ch);
    echo "space";
  }
?>
</html>

1 个答案:

答案 0 :(得分:0)

尝试修改时间限制。

foreach ($urls as $url) {
  set_time_limit(120);
  curl_setopt($ch, CURLOPT_URL, trim($url));
  $html = curl_exec($ch);

每个网址为2分钟