优化MySQL数据源的插入

时间:2011-12-03 16:31:45

标签: mysql sql

我每晚从一家公司获取XML Feed,需要进行一些认真的优化,因为它需要永远

下面的代码展示了我是如何做到的,但必须有一个更好的方法 - 基本上,我正在接受每个产品,然后是相关的供应该产品的零售商

//db connect
include '../php/lib/dbconnect.inc';

$categories = array(1, 2, 4, 8, 9);

foreach ($arr as $key => $cat_id) { {

$url = "http://*********.com/feed?f=PRSP_UK_xx&categories=$cat_id&limit=100&startproducts=$ii&price_min=0.01&sortproducts=score&show=properties";
$c = curl_init($url); 
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_HEADER, 0);
curl_setopt($c, CURLOPT_USERPWD, "****:****");
$xml = simplexml_load_string(curl_exec($c));
curl_close($c);

$num_items = $xml->{'product-count'};

$ii = 0;

while ($ii <= $num_items) { // this sets the number of items from start of xml feed

    $url = "http://********.com/feed?f=PRSP_UK_xx&categories=$cat_id&limit=100&startproducts=$ii&price_min=0.01&sortproducts=score&show=properties";

    $c = curl_init($url); 
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($c, CURLOPT_HEADER, 0);
    curl_setopt($c, CURLOPT_USERPWD, "****:****");
    $xml = simplexml_load_string(curl_exec($c));
    curl_close($c);

// load each product first

    foreach ($xml->product as $products) {

$title = $products->name;

$title = preg_replace('/[^a-z0-9\s]/i', '', $title);

$PRid = $products->id;

$author = $products->properties->group->property[2]->value;

$author = preg_replace('/[^a-z0-9\s]/i', '', $author);

$genre = $products->properties->group->property[4]->value;

$genre = preg_replace('/[^a-z0-9\s]/i', '', $genre);

$prodcat = $products->{'category'};

$prodcat = preg_replace('/[^a-z0-9\s]/i', '', $prodcat);

$prodcatID = $products->{'category-id'};

$lowprice = $products->{'lowest-price'};

$highprice = $products->{'highest-price'};

$imageURL = $products->{'image-url'};

$userrating = $products->rating[0]->average;

$userrating = str_replace(",",".",$userrating);

$profrating = $products->rating[0]->average;

$profrating = str_replace(",",".",$profrating);

    $addline = mysql_query("
    insert into PRprodINFO (
    PRid,
    main_category,
    title,
    author,
    genre,
    prodcat,
    prodcatID,
    userrating,
    profrating,
    lowprice,
    highprice,
    imageURL
    )
        VALUES (
    '$PRid',
    'Books',
    '$title',
    '$author',
    '$genre',
    '$prodcat',
    '$prodcatID',
    '$userrating',
    '$profrating',
    '$lowprice',
    '$highprice',
    '$imageURL'
    ) ON DUPLICATE KEY UPDATE lowprice='$lowprice', highprice='$highprice'",$db);

    if(!$addline) { echo "cannot add to table here".mysql_error(); exit; } // debug

    // now each retailer associated with the product

    foreach ($products->retailer as $retailer) {

    $id = $retailer->{'id'};

    $name = $retailer->{'name'};

    $name = addslashes($name);

    $link = $retailer->{'link'};

    $logoURL = $retailer->{'logo'};

    $stockinfo = $retailer->{'stock-info'};

    $price = $retailer->{'price'};

    $priceshipmin = $retailer->{'price-with-shipping-min'};

    $priceshipmax = $retailer->{'price-with-shipping-max'};

    $dummyid = $PRid.$id;

    $id = preg_replace('/[^a-z0-9\s]/i', '', $id);

    $stockinfo = preg_replace('/[^a-z0-9\s]/i', '', $stockinfo);

    $dummyid = preg_replace('/[^a-z0-9\s]/i', '', $dummyid);

    $addretail = mysql_query("
    insert into PRretailerinfo (
    PRid,
    id,
    dummyid,
    category_id,
    name,
    link,
    logoURL,
    stockinfo,
    price,
    priceshipmin,
    priceshipmax
    )
        VALUES (
    '$PRid',
    '$id',
    '$dummyid',
    '$i',
    '$name',
    '$link',
    '$logoURL',
    '$stockinfo',
    '$price',
    '$priceshipmin',
    '$priceshipmax'
    ) ON DUPLICATE KEY UPDATE price='$price', priceshipmin='$priceshipmin', priceshipmax='$priceshipmax'",$db);

    if(!$addretail) { echo "cannot add to table - price is".$price.mysql_error(); exit; } // debug

} // close
} // close

    // add 100 to url to get next 100 items
$ii = ($ii+100);

}

} // whole thing

我认为有一个更好的方法来做到这一点,而不是逐行,因为整个事情是大约800,000个产品,平均每个产品有4个零售商

我相信在插入之前首先使用多行来构建一个长查询会更快,但我找不到这样做的方法。

2 个答案:

答案 0 :(得分:1)

缓慢很可能不是由于发出多个插入查询,而是很可能是由于您为获取数据而进行的大量http请求。有没有办法可以一次从服务器上获取更多数据?

....可能会将您的产品获取参数从limit = 100更改为limit = $ num_items

考虑到你的评论,我猜你可以尝试通过使用单独的线程加快速度 - 一个用于下载内容,一个用于插入到表中。这样你总是获得新数据,而不是获取,然后等待插入完成,然后获得更多。但是,编码可能变得相当复杂。

答案 1 :(得分:0)

我认为你不能做多记录插入,仍然在重复密钥更新功能上使用mysql ...

尝试禁用要插入的表中的索引。否则mysql必须在每次插入后修改索引,当你真的只需要在结束时执行一次。准备好的声明也应该加快速度。

使用microtime()查找脚本的缓慢部分。与网络io(下载)相比,数据库性能可能并不重要。