滚动销售平均算法实现MySQL + PHP

时间:2012-12-18 01:55:45

标签: php mysql algorithm

我正在为我们的网站制作一个销售排名小部件,它将显示按当前位置在“顶级图表”或“畅销书”列表中排序的产品(可以这么说)。

经过一些阅读后,看起来一个很好的实现方法就是滚动销售平均算法,销售越近,加权就越高。

示例:

$rolling_avg = ((4*$d1)+(3*$d2)+(2*$d3)+$d4+$d5+$d6+$d7)/13;

其中:

  • $ d1 =过去24小时内的销售数量。
  • $ d2 =过去24-48小时内的销售数量。
  • $ d3 =过去48-72小时内的销售数量。
  • $ d4 =过去72-96小时内的销售数量。

依旧......

目前,我正在尝试在大约500k记录的产品数据集上运行此操作,将计算出的排名插回到products表中,以便稍后查询。如果可能的话,我希望能够创建一个脚本来重新计算排名并在每12或24小时在cron上运行。

当前实施:

我当前的实现执行时间太长,我觉得需要在SQL级别完成更多的处理(SELECT查询少得多),但我不确定如何开始这个。

$products = mysql_query("SELECT * FROM products ORDER BY id DESC"); // <-- Est 450-500k rows.

while($product = mysql_fetch_array($products)) {
    $product_id = $product['id'];

    $d1 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 24 HOUR) AND NOW())") or die(mysql_error);
    $d1 = mysql_fetch_array($d1);

    $d2 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 48 HOUR) AND (NOW()-INTERVAL 24 HOUR))");
    $d2 = mysql_fetch_array($d2);

    $d3 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 72 HOUR) AND (NOW()-INTERVAL 48 HOUR))");
    $d3 = mysql_fetch_array($d3);

    $d4 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 96 HOUR) AND (NOW()-INTERVAL 72 HOUR))");
    $d4 = mysql_fetch_array($d4);

    $d5 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 120 HOUR) AND (NOW()-INTERVAL 96 HOUR))");
    $d5 = mysql_fetch_array($d5);

    $d6 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 144 HOUR) AND (NOW()-INTERVAL 120 HOUR))");
    $d6 = mysql_fetch_array($d6);

    $d7 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 168 HOUR) AND (NOW()-INTERVAL 144 HOUR))");
    $d7 = mysql_fetch_array($d7);

    $d1 = $d1[0];
    $d2 = $d2[0];
    $d3 = $d3[0];
    $d4 = $d4[0];
    $d5 = $d5[0];
    $d6 = $d6[0];
    $d7 = $d7[0];       

    $rolling_avg = ((4*$d1)+(3*$d2)+(2*$d3)+$d4+$d5+$d6+$d7)/13;

    mysql_query("UPDATE products SET rolling_sales = '$rolling_avg' WHERE id = '$product_id'");
}

不确定如何从这里优化/进步。但它肯定需要很多的工作。

在提到它之前,我理解mysql_*函数已被折旧,我将在迁移到生产环境之前将其移至PDO。

1 个答案:

答案 0 :(得分:0)

这是一个使用单个查询计算滚动销售额的功能。

function get_rolling_sales($product_id) {

    $query = <<<EOF
SELECT (
    4 * (

   SELECT COUNT(*) FROM orders WHERE (product_id = $product_id) 
   AND (sale_completed BETWEEN (NOW()-INTERVAL 24 HOUR) AND NOW())

) + 3 * (

  SELECT COUNT(*) FROM orders WHERE (product_id = $product_id) 
  AND (sale_completed BETWEEN (NOW()-INTERVAL 48 HOUR) AND (NOW()-INTERVAL 24 HOUR))

) + 2 * (

  SELECT COUNT(*) FROM orders WHERE (product_id = $product_id) 
  AND (sale_completed BETWEEN (NOW()-INTERVAL 72 HOUR) AND (NOW()-INTERVAL 48 HOUR))

) + (

  SELECT COUNT(*) FROM orders WHERE (product_id = $product_id) 
  AND (sale_completed BETWEEN (NOW()-INTERVAL 96 HOUR) AND (NOW()-INTERVAL 72 HOUR))

) + (

  SELECT COUNT(*) FROM orders WHERE (product_id = $product_id) 
  AND (sale_completed BETWEEN (NOW()-INTERVAL 120 HOUR) AND (NOW()-INTERVAL 96 HOUR))

) + (

  SELECT COUNT(*) FROM orders WHERE (product_id = $product_id) 
  AND (sale_completed BETWEEN (NOW()-INTERVAL 144 HOUR) AND (NOW()-INTERVAL 120 HOUR))

) + (

  SELECT COUNT(*) FROM orders WHERE (product_id = $product_id) 
  AND (sale_completed BETWEEN (NOW()-INTERVAL 168 HOUR) AND (NOW()-INTERVAL 144 HOUR))

)

) / 13 AS rolling_sales
EOF;

$result = mysql_query($query);
$row = mysql_fetch_assoc($result);
return $row['rolling_sales'];
}

然而,迭代所有500.000个产品记录仍需要很长时间。您是否真的一次需要所有这些信息(例如用于计算),还是计划在分页表视图中显示?如果您只想显示数据,则可以按需计算rolling_sales。