我目前在mysql表中有超过400k的记录。结构如下:
我正在使用的功能:
function cron_hour_counts()
{
$subids = get_subids();
array_push($subids, '');
$from = '2011-10-20';//last_updated_date('tb_hour_counts');
$to = '2011-10-20';//last_date();
$days = days_interval($from, $to);
$result_array = array();
foreach ($subids as $subid)
{
for ($i = 0; $i < $days; $i++)
{
$hour = '00:00';
for ($t = 0; $t <= 23; $t++)
{
if ($t == 0)
{
$chour = date('H:i', strtotime($hour . '+' . $t . 'hour'));
$phour = date('H:i', strtotime('23:59'));
$date = date('Y-m-d', strtotime($from . '+' . $i . 'day'));
$date_prev = date('Y-m-d', strtotime($date . '- 1 day'));
}
else
{
$chour = date('H:i', strtotime($hour . '+' . $t . 'hour'));
$phour = date('H:i', strtotime($chour . '-1 hour'));
$date = date('Y-m-d', strtotime($from . '+' . $i . 'day'));
$date_prev = $date;
}
$unique_id_query = mysql_query("SELECT (SELECT COUNT(DISTINCT `id`,`subid`) FROM `tb_stats` WHERE (`date` < '" . mysql_real_escape_string($date) . "') OR (`date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "')" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") - (SELECT COUNT(DISTINCT `id`,`subid`) FROM `tb_stats` WHERE (`date` < '" . mysql_real_escape_string($date_prev) . "') OR (`date` = '" . mysql_real_escape_string($date_prev) . "' AND `time` <= '" . mysql_real_escape_string($phour) . "')" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") AS `unique_ids`");
$unique_id_result = mysql_fetch_assoc($unique_id_query);
$total_id_query = mysql_query("SELECT COUNT(DISTINCT `id`,`subid`) AS `total_ids` FROM `tb_stats` WHERE `date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : ""));
$total_id_result = mysql_fetch_assoc($total_id_query);
$unique_ip_query = mysql_query("SELECT (SELECT COUNT(DISTINCT `ip`,`subid`) FROM `tb_stats` WHERE (`date` < '" . mysql_real_escape_string($date) . "') OR (`date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "')" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") - (SELECT COUNT(DISTINCT `ip`,`subid`) FROM `tb_stats` WHERE `date` <= '" . mysql_real_escape_string($date_prev) . "' AND `time` <= '" . mysql_real_escape_string($phour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") AS `unique_ips`");
$unique_ip_result = mysql_fetch_assoc($unique_ip_query);
$total_ip_query = mysql_query("SELECT COUNT(DISTINCT `ip`,`subid`) AS `total_ips` FROM `tb_stats` WHERE `date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : ""));
$total_ip_result = mysql_fetch_assoc($total_ip_query);
$global_query = mysql_query("SELECT COUNT(`id`) AS `global` FROM `tb_stats` WHERE `date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : ""));
$global_result = mysql_fetch_assoc($global_query);
$result = array();
$result['date'] = $date;
$result['hour'] = $chour;
$result['subid'] = $subid;
$result['unique_ids'] = $unique_id_result['unique_ids'];
$result['total_ids'] = $total_id_result['total_ids'];
$result['unique_ips'] = $unique_ip_result['unique_ips'];
$result['total_ips'] = $total_ip_result['total_ips'];
$result['global'] = $global_result['global'];
$result_array[] = $result;
}
}
}
//db insert
print_r($result_array);
}
有20个子文件,一天需要40分钟才能执行。关于加快这一点的任何提示?
答案 0 :(得分:1)
这是我的解决方案。它的工作速度提高了20倍。
function cron_hour_counts()
{
$subids = get_subids();
//array_push($subids, '');
$from = '2011-10-20';//last_updated_date('tb_hour_counts');
$to = '2011-10-20';//last_date();
$days = days_interval($from, $to);
$result_array = array();
for ($i = 0; $i < $days; $i++)
{
$hour = '00:00';
for ($t = 0; $t <= 23; $t++)
{
$date = date('Y-m-d', strtotime($from . '+' . $i . 'day'));
$currentHour = date('H:i', strtotime($hour . '+' . $t . 'hour'));
$nextHour = date('H:i', strtotime($currentHour . '+59 minutes'));
$unique_ids_query = mysql_query("
SELECT COUNT(id) AS unique_ids,subid
FROM
(
SELECT id,subid,date,time
FROM tb_stats
WHERE date <= '" . mysql_real_escape_string($date) . "'
GROUP BY id,subid
) AS id_inner
WHERE date = '" . mysql_real_escape_string($date) . "'
AND time BETWEEN '" . mysql_real_escape_string($currentHour) . "' AND '" . mysql_real_escape_string($nextHour) . "'
GROUP BY subid;
");
pull_data('unique_ids', $date, $currentHour, $unique_ids_query, $subids, $result_array);
$unique_ips_query = mysql_query("
SELECT COUNT(ip) AS unique_ips,subid
FROM
(
SELECT ip,subid,date,time
FROM tb_stats
WHERE date <= '" . mysql_real_escape_string($date) . "'
GROUP BY ip,subid
) AS ip_inner
WHERE date = '" . mysql_real_escape_string($date) . "'
AND time BETWEEN '" . mysql_real_escape_string($currentHour) . "' AND '" . mysql_real_escape_string($nextHour) . "'
GROUP BY subid;
");
pull_data('unique_ips', $date, $currentHour, $unique_ips_query, $subids, $result_array);
$total_ids_query = mysql_query("
SELECT COUNT(DISTINCT id,subid) AS total_ids,subid
FROM tb_stats
WHERE date = '" . mysql_real_escape_string($date) . "'
AND `time` <= '" . mysql_real_escape_string($nextHour) . "'
GROUP BY subid
");
pull_data('total_ids', $date, $currentHour, $total_ids_query, $subids, $result_array);
$total_ips_query = mysql_query("
SELECT COUNT(DISTINCT ip,subid) AS total_ips,subid
FROM tb_stats
WHERE date = '" . mysql_real_escape_string($date) . "'
AND `time` <= '" . mysql_real_escape_string($nextHour) . "'
GROUP BY subid;
");
pull_data('total_ips', $date, $currentHour, $total_ips_query, $subids, $result_array);
$global_query = mysql_query("
SELECT COUNT(id) AS global,subid
FROM tb_stats
WHERE date = '" . mysql_real_escape_string($date) . "'
AND time <= '" . mysql_real_escape_string($nextHour) . "'
GROUP BY subid;
");
pull_data('global', $date, $currentHour, $global_query, $subids, $result_array);
}
}
print_r($result_array);
}
答案 1 :(得分:0)
优化您的查询。
以下是可以确定优化的查询示例:
您在上面的评论中发布了以下查询作为unique_ids_query的示例:
SELECT (SELECT COUNT(DISTINCT id,subid) FROM tb_stats WHERE subid = '1' AND date <= '2011-10-20') - (SELECT COUNT(DISTINCT id,subid) FROM tb_stats WHERE subid = '1' AND date <= '2011-10-19') AS unique_ids;
本质上,查询是在2011-10-20和2011-10-19之间获得不同id,subid组合的变化,其中subid为'1'。您这样做的方法是先计算2011-10-20日期以下的所有记录,然后计算2011-10-19日期以下的所有记录。您在该查询中也有三个SELECT语句。
除非我误解为与计算2011-10-19和2011-10-20之间的所有记录相同,您可以使用以下内容进行计算:
SELECT COUNT(DISTINCT id,subid) AS unique_ids FROM tb_stats WHERE subid = '1' AND date <= '2011-10-20' AND date >= '2011-10-19';
如果可能的话,您还应该开始在PHP中使用MySQLi或PDO进行存储过程,这也可能是性能提升。
此外,您应该通过单个连接运行尽可能多的查询,以减少连接延迟(它会累加!)
最后一个潜在的好处是编写MySQL函数。可以在不使用带有MySQL函数的COUNT或DISTINCT的情况下运行上面的查询,这将是一个性能助推器,超出了将其作为函数运行所给出的提升。