I have a script that parses a csv to array with a million rows in it.
I want to batch this with a cronjob. For example every 100.000 rows i want to pause the script and then continue it again to prevent memory leaks etc.
My script for now is looking like this : It's not relevant what is does but how can i loop through this in batches in an cronjob?
Can i just make an cronjob what calls this script every 5 minutes and remembers where the foreach loop is paused?
$csv = file_get_contents(CSV);
$array = array_map("str_getcsv", explode("\n", $csv));
$headers = $array[0];
$number_of_records = count($array);
for ($i = 1; $i < $number_of_records; $i++) {
$params['body'][] = [
'index' => [
'_index' => INDEX,
'_type' => TYPE,
'_id' => $i
]
];
// Set the right keys
foreach ($array[$i] as $key => $value) {
$array[$i][$headers[$key]] = $value;
unset($array[$i][$key]);
}
// Loop fields
$params['body'][] = [
'Inrijdtijd' => $array[$i]['Inrijdtijd'],
'Uitrijdtijd' => $array[$i]['Uitrijdtijd'],
'Parkeerduur' => $array[$i]['Parkeerduur'],
'Betaald' => $array[$i]['Betaald'],
'bedrag' => $array[$i]['bedrag']
];
// Every 1000 documents stop and send the bulk request
if ($i % 100000 == 0) {
$responses = $client->bulk($params);
// erase the old bulk request
$params = ['body' => []];
// unset the bulk response when you are done to save memory
unset($responses);
}
// Send the last batch if it exists
if (!empty($params['body'])) {
$responses = $client->bulk($params);
}
}
答案 0 :(得分:1)
在给定的代码中,脚本将始终从头开始处理,因为没有保留某种指针。
我的建议是将CSV文件拆分成碎片,让另一个脚本逐个解析碎片(即每5分钟)。 (并在之后删除文件)。
$fp = fopen(CSV, 'r');
$head = fgets($fp);
$output = [$head];
while (!feof($fp)) {
$output[] = fgets($fp);
if (count($output) == 10000) {
file_put_contents('batches/batch-' . $count . '.csv', implode("\n", $output));
$count++;
$output = [$head];
}
}
if (count($output) > 1) {
file_put_contents('batches/batch-' . $count . '.csv', implode("\n", $output));
}
现在原始脚本每次都可以处理文件:
$files = array_diff(scandir('batches/'), ['.', '..']);
if (count($files) > 0) {
$file = 'batches/' . $files[0];
// PROCESS FILE
unlink($file);
}