我正在寻找一些建议,以最快和最可靠的方式处理大型csv / xml数据文件 - 没有脚本超时或无法在第一次尝试时加载文件。
我有下面的脚本,它演示了(通过Codeigniter,虽然框架不相关)我如何加载文件和执行插入/更新脚本等。
但是,我注意到在手动运行脚本时,尽管我包含了set_time_limit(0),但是在第一次尝试运行脚本时脚本会超时的一半时间。同样,有时我会收到错误消息,指出我的脚本无法完全加载xml文件 - 但是,如果我重新运行脚本,它通常适用于第二次或第三次尝试。
如果我使用cron作业以确保我的网站内容自动保持最新,这是毫无意义的。该文件的范围可以从一次拥有11,000 - 30,000行,我知道这很多 - 但必须有一个更可靠的方法来做到这一点......
有什么建议吗?
public function home() {
set_time_limit(0);
$this->load->model('Update_model');
$this->Update_model->update_in_stock();
$xml = simplexml_load_file('compress.zlib://path/to/xml/compression/gzip/');
foreach ($xml->merchant as $merchant)
{
$merchant_id = (string) $merchant['id'];
$merchant_name = (string) $merchant['name'];
$data = array(
'merchant_id' => $merchant_id,
'merchant_name' => $merchant_name
);
$this->load->model('Insert_model');
$this->Insert_model->insert_merchants($data);
foreach ($merchant->prod as $prod)
{
$aw_product_id = (string) $prod['id'];
$in_stock = (string) $prod['in_stock'];
foreach ($prod->text as $text)
{
$product_name = (string) $text->name;
}
foreach ($prod->uri as $uri)
{
$aw_deep_link = (string) $uri->awTrack;
$aw_image_url = (string) $uri->awImage;
}
foreach ($prod->price as $price)
{
$search_price = (string) $price->buynow;
$rrp_price = (string) $price->rrp;
}
foreach ($prod->cat as $cat)
{
$category_id = (string) $cat->awCatId;
$category_name = (string) $cat->awCat;
}
foreach ($prod->brand as $brand)
{
$brand_id = (string) $brand->awBrandId;
$brand_name = (string) $brand->brandName;
}
$data = array(
'merchant_id' => $merchant_id,
'merchant_name' => $merchant_name,
'aw_product_id' => $aw_product_id,
'in_stock' => $in_stock,
'product_name' => $product_name,
'aw_deep_link' => $aw_deep_link,
'aw_image_url' => $aw_image_url,
'search_price' => $search_price,
'rrp_price' => $rrp_price,
'category_id' => $category_id,
'category_name' => $category_name,
'brand_id' => $brand_id,
'brand_name' => $brand_name
);
$this->Insert_model->insert_products($data);
}
}
echo 'Update complete!';
}