我每晚从一家公司获取XML Feed,需要进行一些认真的优化,因为它需要永远
下面的代码展示了我是如何做到的,但必须有一个更好的方法 - 基本上,我正在接受每个产品,然后是相关的供应该产品的零售商
//db connect
include '../php/lib/dbconnect.inc';
$categories = array(1, 2, 4, 8, 9);
foreach ($arr as $key => $cat_id) { {
$url = "http://*********.com/feed?f=PRSP_UK_xx&categories=$cat_id&limit=100&startproducts=$ii&price_min=0.01&sortproducts=score&show=properties";
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_HEADER, 0);
curl_setopt($c, CURLOPT_USERPWD, "****:****");
$xml = simplexml_load_string(curl_exec($c));
curl_close($c);
$num_items = $xml->{'product-count'};
$ii = 0;
while ($ii <= $num_items) { // this sets the number of items from start of xml feed
$url = "http://********.com/feed?f=PRSP_UK_xx&categories=$cat_id&limit=100&startproducts=$ii&price_min=0.01&sortproducts=score&show=properties";
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_HEADER, 0);
curl_setopt($c, CURLOPT_USERPWD, "****:****");
$xml = simplexml_load_string(curl_exec($c));
curl_close($c);
// load each product first
foreach ($xml->product as $products) {
$title = $products->name;
$title = preg_replace('/[^a-z0-9\s]/i', '', $title);
$PRid = $products->id;
$author = $products->properties->group->property[2]->value;
$author = preg_replace('/[^a-z0-9\s]/i', '', $author);
$genre = $products->properties->group->property[4]->value;
$genre = preg_replace('/[^a-z0-9\s]/i', '', $genre);
$prodcat = $products->{'category'};
$prodcat = preg_replace('/[^a-z0-9\s]/i', '', $prodcat);
$prodcatID = $products->{'category-id'};
$lowprice = $products->{'lowest-price'};
$highprice = $products->{'highest-price'};
$imageURL = $products->{'image-url'};
$userrating = $products->rating[0]->average;
$userrating = str_replace(",",".",$userrating);
$profrating = $products->rating[0]->average;
$profrating = str_replace(",",".",$profrating);
$addline = mysql_query("
insert into PRprodINFO (
PRid,
main_category,
title,
author,
genre,
prodcat,
prodcatID,
userrating,
profrating,
lowprice,
highprice,
imageURL
)
VALUES (
'$PRid',
'Books',
'$title',
'$author',
'$genre',
'$prodcat',
'$prodcatID',
'$userrating',
'$profrating',
'$lowprice',
'$highprice',
'$imageURL'
) ON DUPLICATE KEY UPDATE lowprice='$lowprice', highprice='$highprice'",$db);
if(!$addline) { echo "cannot add to table here".mysql_error(); exit; } // debug
// now each retailer associated with the product
foreach ($products->retailer as $retailer) {
$id = $retailer->{'id'};
$name = $retailer->{'name'};
$name = addslashes($name);
$link = $retailer->{'link'};
$logoURL = $retailer->{'logo'};
$stockinfo = $retailer->{'stock-info'};
$price = $retailer->{'price'};
$priceshipmin = $retailer->{'price-with-shipping-min'};
$priceshipmax = $retailer->{'price-with-shipping-max'};
$dummyid = $PRid.$id;
$id = preg_replace('/[^a-z0-9\s]/i', '', $id);
$stockinfo = preg_replace('/[^a-z0-9\s]/i', '', $stockinfo);
$dummyid = preg_replace('/[^a-z0-9\s]/i', '', $dummyid);
$addretail = mysql_query("
insert into PRretailerinfo (
PRid,
id,
dummyid,
category_id,
name,
link,
logoURL,
stockinfo,
price,
priceshipmin,
priceshipmax
)
VALUES (
'$PRid',
'$id',
'$dummyid',
'$i',
'$name',
'$link',
'$logoURL',
'$stockinfo',
'$price',
'$priceshipmin',
'$priceshipmax'
) ON DUPLICATE KEY UPDATE price='$price', priceshipmin='$priceshipmin', priceshipmax='$priceshipmax'",$db);
if(!$addretail) { echo "cannot add to table - price is".$price.mysql_error(); exit; } // debug
} // close
} // close
// add 100 to url to get next 100 items
$ii = ($ii+100);
}
} // whole thing
我认为有一个更好的方法来做到这一点,而不是逐行,因为整个事情是大约800,000个产品,平均每个产品有4个零售商
我相信在插入之前首先使用多行来构建一个长查询会更快,但我找不到这样做的方法。
答案 0 :(得分:1)
缓慢很可能不是由于发出多个插入查询,而是很可能是由于您为获取数据而进行的大量http请求。有没有办法可以一次从服务器上获取更多数据?
....可能会将您的产品获取参数从limit = 100更改为limit = $ num_items
考虑到你的评论,我猜你可以尝试通过使用单独的线程加快速度 - 一个用于下载内容,一个用于插入到表中。这样你总是获得新数据,而不是获取,然后等待插入完成,然后获得更多。但是,编码可能变得相当复杂。
答案 1 :(得分:0)
我认为你不能做多记录插入,仍然在重复密钥更新功能上使用mysql ...
尝试禁用要插入的表中的索引。否则mysql必须在每次插入后修改索引,当你真的只需要在结束时执行一次。准备好的声明也应该加快速度。
使用microtime()查找脚本的缓慢部分。与网络io(下载)相比,数据库性能可能并不重要。