Wordpress PHP模板页面保持循环

时间:2016-09-02 23:45:45

标签: php mysql wordpress curl

我有一个问题。我创建了一个页面来从公共实体站点中删除一些数据。他们的网站在使用方面并未禁止这一点。无论如何,这是公共数据。我知道我为此写了一个向下和脏的页面,但我无法弄清楚为什么它继续循环。我的问题是我创建的模板页面,用于运行实际的scrape代码,使其保持连续运行。它一次又一次地开始。这是代码:

<?php
/*
Template Name: Scraping template
*/

$strFile = $_GET['scrape'];
$intNumOfRec = 0;
$intNumOfErr = 0;
$intHeaderLine = '';

function fnLogger ($strLine) {
    $hdlLogFile = fopen("ScrapingLogFile","a") or die("Unable to open file!");
    fwrite($hdlLogFile,$strLine."\r\n");
    fclose($hdlLogFile);
    return;
}
function fnProcessMcr() {
    global $wpdb,$intNumOfRec,$intNumOfErr;

    $intRecChunk = '50';
    $strQuery = 'SELECT * FROM frg_subdivision_index WHERE authority is null limit '.$intRecChunk.';';
    $objQuery = $wpdb->get_results($strQuery);
    echo $strQuery.'</br>';
    fnLogger($strQuery);
    foreach($objQuery as $index=>$row) 
    {
        fnLogger($row->id.' ');
        if(strlen($row->book) !== 0 && strlen($row->map) !== 0 && strlen($row->begin) !== 0) 
        {
            $url = '[url withheld]?q='.$row->book.'-'.$row->map.'-'.$row->begin;
            $ch = curl_init();
            $timeout = 5;
            curl_setopt($ch, CURLOPT_URL, $url);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
            curl_setopt($ch, CURLOPT_ENCODING, "gzip");
            $data = curl_exec($ch);
            curl_close($ch);
            $intCheckIndex = stripos($data,'<td class="right aligned"><h3 class="ui huge basic header">');
            if(!$intCheckIndex) 
            {
                //echo $row->id.': Could not find type prefix.';
                $strType = 'Unknown';
                $strJurisdiction = 'Unknown';
                $intNumOfErr++;
            }
            else 
            {
                $data = substr($data,$intCheckIndex+59);
                $intCheckIndex = stripos($data,'<strong>[CANCELLED]</strong>');
                $strType = "";
                if($intCheckIndex !== false) {
                    $data = substr($data,$intCheckIndex+28);
                    $strType = 'Cancelled ';
                }
                $strType .= trim(substr($data,0,stripos($data,'Parcel')-1));
                $data = substr($data,stripos($data,'Local Jurisdiction</td>')+23);
                $data = substr($data,stripos($data,'<td>')+4);
                $strJurisdiction = ucwords(strtolower(trim(substr($data,0,stripos($data,'<')))));
                //echo ($index+$intBegRec).': Type is: '.$strType.' in '.$strJurisdiction;
                $intNumOfRec++;
            }       
        }    
        else 
        {
            echo $row->id.': Missing book, map or begin.</br>';
            $strType = 'Unknown';
            $strJurisdiction = 'Unknown';
            $intNumOfErr++;
        }
        $strUpdateResults = $wpdb->update('frg_subdivision_index',array(
                                    'type' => $strType,
                                    'authority' => $strJurisdiction),
                                    array(
                                        'id' => $row->id));
        echo $row->id.':  Type: '.$strType.'    Authority:'.$strJurisdiction;
        if($strUpdateResults === false) 
        {
            echo ': ERROR update database.</br>';
            $intNumOfErr++;
        }
        else 
        {
            echo '</br>';
        }
    }

    echo "</br></br>Number of records updated was: ".$intNumOfRec.'</br>';
    echo "Number of errors was: ".$intNumOfErr.'</br>';
    return;
}

switch ($strFile) {
    case 'mcr':
        fnLogger('Entered Switch Case mcr');
        fnProcessMcr();
        break;
    case 'mcrunknown':
        fnProcessMcrUnknown();
        break;
    default:
        fnChangeTo404();
}


?>

以下是日志文件的输出,以便您可以看到它正在做什么。

Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30729 
30730 
30731 
30732 
30733 
30734 
30735 
30736 
30737 
30738 
30739 
30740 
30741 
30742 
30743 
30744 
30745 
30746 
30747 
30748 
30749 
30750 
30751 
30752 
30753 
30754 
30755 
30756 
30757 
30758 
30759 
30760 
30761 
30762 
30763 
30764 
30765 
30766 
30767 
30768 
Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30768 
30769 
30769 
30770 
30770 
30771 
30771 
30772 
30772 
30773 
30773 
30774 
30774 
30775 
30775 
30776 
30776 
30777 
30777 
30778 
30778 
30779 
30780 
30781 
30782 
30783 
30784 
30785 
30786 
30787 
30788 
30789 
30790 
30791 
30792 
30793 
30794 
30795 
30796 
30797 
30798 
30799 
30800 
30801 
30802 
30803 
30804 
30805 
30806 
30807 
30808 
Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30808 
30809 
30809 
30810 
30810 
30811 
30811 
30812 
30812 
30813 
30813 
30814 
30814 
30815 
30815 
30816 
30816 
30817 
30817 
30818 
30819 
30820 
30821 
30822 
30823 
30824 
30825 
30826 
30827 
30828 
30829 
30830 
Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30830 
30831 
30831 
30832 
30832 
30833 
30833 
30834 
30834 
30835 
30835 
30836 
30836 
30837 
30837 
30838 
30838 
30839 
30839 
30840 
30840 
30841 
30841 
Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30841 
30842 

任何人都知道为什么它会一直循环回来?

1 个答案:

答案 0 :(得分:0)

好的,代码没有问题。它正在处理Wordpress中的超时。