Batch Processing Loop

时间:2016-04-07 10:54:08

标签: php cron crontab

I have a script that parses a csv to array with a million rows in it.

I want to batch this with a cronjob. For example every 100.000 rows i want to pause the script and then continue it again to prevent memory leaks etc.

My script for now is looking like this : It's not relevant what is does but how can i loop through this in batches in an cronjob?

Can i just make an cronjob what calls this script every 5 minutes and remembers where the foreach loop is paused?

$csv = file_get_contents(CSV);
$array = array_map("str_getcsv", explode("\n", $csv));

$headers = $array[0];
$number_of_records = count($array);
    for ($i = 1; $i < $number_of_records; $i++) {
      $params['body'][] = [
        'index' => [
          '_index' => INDEX,
          '_type' => TYPE,
          '_id' => $i
        ]
      ];

      // Set the right keys
      foreach ($array[$i] as $key => $value) {
        $array[$i][$headers[$key]] = $value;
        unset($array[$i][$key]);
      }

      // Loop fields
      $params['body'][] = [
        'Inrijdtijd' => $array[$i]['Inrijdtijd'],
        'Uitrijdtijd' => $array[$i]['Uitrijdtijd'],
        'Parkeerduur' => $array[$i]['Parkeerduur'],
        'Betaald' => $array[$i]['Betaald'],
        'bedrag' => $array[$i]['bedrag']
      ];

      // Every 1000 documents stop and send the bulk request
      if ($i % 100000 == 0) {
        $responses = $client->bulk($params);

        // erase the old bulk request
        $params = ['body' => []];

        // unset the bulk response when you are done to save memory
        unset($responses);
      }

      // Send the last batch if it exists
      if (!empty($params['body'])) {
        $responses = $client->bulk($params);
      }
    }

1 个答案:

答案 0 :(得分:1)

在给定的代码中,脚本将始终从头开始处理,因为没有保留某种指针。

我的建议是将CSV文件拆分成碎片,让另一个脚本逐个解析碎片(即每5分钟)。 (并在之后删除文件)。

$fp = fopen(CSV, 'r');

$head   = fgets($fp);

$output = [$head];
while (!feof($fp)) {
    $output[] = fgets($fp);

    if (count($output) == 10000) {
        file_put_contents('batches/batch-' . $count . '.csv', implode("\n", $output));
        $count++;

        $output = [$head];
    }
}

if (count($output) > 1) {
    file_put_contents('batches/batch-' . $count . '.csv', implode("\n", $output));
}

现在原始脚本每次都可以处理文件:

$files = array_diff(scandir('batches/'), ['.', '..']);

if (count($files) > 0) {
    $file = 'batches/' . $files[0];

    // PROCESS FILE

    unlink($file);
}