HTML Dom Parse 5000 +项目

时间:2013-04-18 01:19:13

标签: php mysql parsing

是否有效地在下面有效地运行/编写此代码?

当我运行它时(通过Chrome浏览器)总是会在第500个项目周围超时并将我重定向回我的主页。

<?php

include_once('config.php');
include_once('simple_html_dom.php');

for($i = 0; $i <= 5000; ++$i){

// Retrieve the DOM from a given URL
$html = file_get_html($url);

// Loop that checks through page contents and retrieves all required
foreach($html->find('div.product-details-contents') as $content) {
$detail['productid'] = $i;
$detail['title'] = $content->find('span.title', 0)->plaintext;
$detail['unit'] = $content->find('span.unit-size', 0)->plaintext;

$sqlstring = implode("','", $detail); 

$sql = "INSERT INTO `cdidlist` (`productid`, `title`, `unit`) VALUES ('$sqlstring')";

if (!mysqli_query($connect, $sql)) {
echo "Error: " . mysqli_error();
}
echo $id . " " . $detail['title'] . " Item Added SUCSESSFULLY! <br>";

    }
}
?>

2 个答案:

答案 0 :(得分:2)

首先,删除sleep(10);应该可以节省大约50,000秒..

答案 1 :(得分:1)

您正在打开5000个网页并对其进行解析。这不能有效地完成。但是为了防止你的脚本死亡,你可以在for循环中使用set_time_limit(600),确保你也有appropriately high timeout in php.ini

修改:您不拥有该服务器。这意味着您将不得不将其推向客户端。它会是这样的:

PHP:

if(isset($_REQUEST['i'])) {
   $i = (int) $_REQUEST['i']; // sanitize the input
   $error_message = false;
   /*
     load the page, parse the page and input it into the DB.
     If there is an error, save it to $error_message
   */
   if(!$error_message) {
       die(json_encode('ok')); // just die'ing is usually bad, but this is a one-off script
   } else {
       die(json_encode($error_message));
   }
}

在你的HTML中:

<p id="status">Status</p>
<script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script type="text/javascript">
  $(function () {
     'use strict';
     var get = function (i) {
         if (i > 5000) {
             $('#status).html('complete');
         } else {
            $.get({
                url: window.location.href,
                data: {i: i},
                success: function (data) {
                   if(data === 'ok'){
                      $('#status').html('fetched ' + i);
                      get(i + 1);
                   } else {
                      $('#status').html('error fetching ' + i + ': ' + data);
                   }
                }  
            });
         }
     };
     get(0);
  });
</script>

编辑2 :正如其他人所提到的,这很容易被SQL注入。有关准备好的陈述,请参阅PDOPDOStatement