php Script在每次迭代后花费的时间越来越多

时间:2015-04-10 05:56:01

标签: php dom curl phpexcel

我使用PHPExcel使用php创建excel文件。首先看下面的代码,这段代码的问题是,在将一些数据保存到excel之后,它只会保留在进程中,但不会保存任何内容。我认为该脚本将所有数据保存为catch和临时值,这导致脚本在每次迭代后进行越来越多的加载。 说明:

首先,这段代码逐个从一个excel文件中检索一些整数值。 sample.xls(此文件仅包含A列中的值)。假设它从单元格A1中检索到第一个值是1212,那么代码设置$ target = 1212,在curl函数检索1212的数据并在结果文件夹中保存为1212.html作为html。 dom图书馆开始工作之后。 1212.html文件包含三列和多行的表。所以dom抓取td和tr的数据,并在excel单元格中保存相应的值,最后将数据保存到excelresult文件夹中,如1212.xlsx,再次对sample.xls中的单元格A2进行相同的处理,重新获取一些值,如1213,以及开始抓住等等。

问题:

这里需要很少的时间来获得第一个值,如1212,然后花费更多的时间来获得第二个值1213,而且只需要四个或五个值就可以花费这么长时间(很多分钟)来执行,请帮我减少这一次,让这个过程更快。感谢。

代码:

<?php
......
ini_set('include_path', ini_get('include_path').';../Classes/');
include_once 'PHPExcel.php';
include_once 'Excel2007.php';

$objPHPExcel = new PHPExcel();

$objPHPExcel->getProperties()->....//set some properties//
$excel->read('sample.xls'); // added excel reader from which we need to take some values   
        $x=1;
        while($x<=$excel->sheets[0]['numRows']) { // reading row by row 
          $y=1;
          while($y<=$excel->sheets[0]['numCols']) {// reading column by column 
            $cell = isset($excel->sheets[0]['cells'][$x][$y]) ? $excel->sheets[0]['cells'][$x][$y] : '';
            $target = $cell;


//  $objWorksheet = $objPHPExcel->getActiveSheet();
            //  $highestRow = $objWorksheet->getHighestRow();
            //  for($row=1; $row < $highestRow; ++$row){
   // $objPHPExcel->getActiveSheet()->removeRow($row,$row);
         // }
/* some lines of code using curl to fetch data for $target value
........... */
//below is the code which retrives data from html table and saves into excel file.
$url='results/'.$target.'.html';
include_once('dom.php');

$html=file_get_html($url);

    $record_find='first';

    foreach($html->find('table#GridView1') as $e){

                 if($record_find=='first')

                 $i=1;
                 $j=0;

                 foreach($e->find('tr') as $e1){

                                 $distno=trim($e1->find('td', 0)->innertext);
                                 $acno=trim($e1->find('td', 1)->innertext);
                                 $partno=trim($e1->find('td', 2)->innertext);
                 $objPHPExcel->setActiveSheetIndex(0);
                                $objPHPExcel->getActiveSheet()->SetCellValue('A'.$j, $distno);
                                $objPHPExcel->getActiveSheet()->SetCellValue('B'.$j, $acno);
                                $objPHPExcel->getActiveSheet()->SetCellValue('C'.$j, $partno);

                                 $j++;
                 }
    }

$objPHPExcel->getActiveSheet()->setTitle($target);

$objWriter = new PHPExcel_Writer_Excel2007($objPHPExcel); 
$objWriter->save('excelresult/'.$target.'.xlsx');

$y++;
          }
          $x++;
        } 
?>

卷曲:

$debug = 1;
$url = "url";
$f = fopen('log.txt', 'w');
$cookies = 'cookies.txt';
touch($cookies);
$useragent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/36.0.1985.125 Chrome/36.0.1985.125 Safari/537.36';


$ch = curl_init($url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);

$html = curl_exec($ch);

curl_close($ch);

preg_match('~<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" />~', $html, $viewstate);
preg_match('~<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)" />~', $html, $eventValidation);

$viewstate = $viewstate[1];
$eventValidation = $eventValidation[1];

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
//curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, $f);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 985000);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);

// Collecting all POST fields
$postfields = array();
$postfields['__EVENTTARGET'] = "";
$postfields['__EVENTARGUMENT'] = "";
$postfields['__LASTFOCUS'] = "";
$postfields['__VIEWSTATE'] = $viewstate;
$postfields['__EVENTVALIDATION'] = $eventValidation;
$postfields['cns_fer'] = 2;
$postfields['xttPd'] = $target;
$postfields['tsfDes'] = "Search";

curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
$ret = curl_exec($ch);
curl_close($ch);
file_put_contents('results/'.$target.'.html', $ret);

1 个答案:

答案 0 :(得分:0)

类似的东西:

for($row=1; $row < $highestRow; ++$row){
    $objPHPExcel->getActiveSheet()->removeRow($row,$row);
}

每次迭代都需要花费大量时间,而且大部分都是多余的......

如果您需要从现有工作表中删除所有数据,请使用

    $objPHPExcel->getActiveSheet()->removeRow(1,$highestRow);

相反,不需要循环(特别是虚假循环删除已经多次删除的内容),在一次调用中删除所有内容

我也很困惑为什么你使用一个库来读取文件(excel Reader)和另一个用于编写(PHPExcel)的时候你可以使用单个库(PHPExcel)来实现这两个目的....虽然你似乎使用excel Reader做的很少,因为你正在迭代该电子表格中的每个单元格,然后似乎根本不对它做任何事情

修改

我最后评论的意思是:

<?php
......
ini_set('include_path', ini_get('include_path').';../Classes/');
include_once 'PHPExcel.php';
include_once 'Excel2007.php';

$excel->read('sample.xls'); // added excel reader from which we need to take some values   

$x=1;
while($x<=$excel->sheets[0]['numRows']) { // reading row by row 
    $y=1;
    while($y<=$excel->sheets[0]['numCols']) {// reading column by column 
        $cell = isset($excel->sheets[0]['cells'][$x][$y]) ? $excel->sheets[0]['cells'][$x][$y] : '';
        $target = $cell;

        /* some lines of code using curl to fetch data for $target value
            ........... */
        //below is the code which retrives data from html table and saves into excel file.
        $url='results/'.$target.'.html';
        include_once('dom.php');

        $html=file_get_html($url);

        $objPHPExcel = new PHPExcel();
        $objPHPExcel->getProperties()->....//set some properties//

        $record_find='first';

        foreach($html->find('table#GridView1') as $e){
            if($record_find=='first')
                $i=1;
            $j=0;

            foreach($e->find('tr') as $e1){
                $distno=trim($e1->find('td', 0)->innertext);
                $acno=trim($e1->find('td', 1)->innertext);
                $partno=trim($e1->find('td', 2)->innertext);
                $objPHPExcel->setActiveSheetIndex(0);
                $objPHPExcel->getActiveSheet()->SetCellValue('A'.$j, $distno);
                $objPHPExcel->getActiveSheet()->SetCellValue('B'.$j, $acno);
                $objPHPExcel->getActiveSheet()->SetCellValue('C'.$j, $partno);

                $j++;
            }
        }

        $objPHPExcel->getActiveSheet()->setTitle($target);

        $objWriter = new PHPExcel_Writer_Excel2007($objPHPExcel); 
        $objWriter->save('excelresult/'.$target.'.xlsx');

        $objPHPExcel->disconnectWorksheets();
        unset($objPHPExcel);

        $y++;
    }
    $x++;
} 
?>