我正在为广播电台创建一个报告,该报告生成在线听众的日志,以记录IP,日期,时间,用户总听力等。
听众表
client_ip date time date_time listeners
--------------- ---------- -------- ------------------- -----------
166.147.81.179 2012-04-30 00:19:46 2012-04-30 00:19:46 1
64.12.243.203 2012-04-30 04:38:37 2012-04-30 04:38:37 1
198.228.211.195 2012-04-30 05:36:33 2012-04-30 05:36:33 1
198.228.211.195 2012-04-30 05:36:34 2012-04-30 05:36:34 2
198.228.211.195 2012-04-30 05:36:35 2012-04-30 05:36:35 2
198.228.211.195 2012-04-30 05:36:35 2012-04-30 05:36:35 3
166.147.81.179 2012-04-30 05:47:13 2012-04-30 05:47:13 2
76.170.251.97 2012-04-30 06:01:37 2012-04-30 06:01:37 2
76.170.251.97 2012-04-30 06:01:39 2012-04-30 06:01:39 2
76.170.251.97 2012-04-30 06:01:39 2012-04-30 06:01:39 2
同时保存歌曲细节(标题,艺术家,专辑,长度,日期,时间等)的记录。
播放列表表
title artist length_in_secs played_date played_time start_date_time end_date_time
-------------------------- ------------------------------- -------------- ----------- ----------- ------------------- ---------------------
We Found Love Rihanna 184 2012-04-30 00:00:21 2012-04-30 00:00:21 2012-04-30 00:03:25
Photograph Nickelback 216 2012-04-30 00:03:31 2012-04-30 00:03:31 2012-04-30 00:07:07
Not Over You Gavin DeGraw 214 2012-04-30 00:07:18 2012-04-30 00:07:18 2012-04-30 00:10:52
Stereo Hearts Gym Class Heroes Ft Adam Levine 210 2012-04-30 00:10:55 2012-04-30 00:10:55 2012-04-30 00:14:25
I Gotta Feeling Black Eyed Peas 243 2012-04-30 00:15:03 2012-04-30 00:15:03 2012-04-30 00:19:06
One Thing Leads To Another Fixx 182 2012-04-30 00:19:14 2012-04-30 00:19:14 2012-04-30 00:22:16
Raise Your Glass Pink 202 2012-04-30 00:22:29 2012-04-30 00:22:29 2012-04-30 00:25:51
Better In Time Leona Lewis 216 2012-04-30 00:30:13 2012-04-30 00:30:13 2012-04-30 00:33:49
Tainted Love Soft Cell 153 2012-04-30 00:33:56 2012-04-30 00:33:56 2012-04-30 00:36:29
Haven't Met You Yet Michael Buble' 242 2012-04-30 00:37:14 2012-04-30 00:37:14 2012-04-30 00:41:16
因此,报告要求是“在日期或日期范围内有多少用户收听歌曲”,我就像这样编写查询。它提供了正确的输出(据我所知),但查询执行需要时间与数据大小不成比例 - 从5秒到5-10分钟,具体取决于日期范围。
SELECT DATE_FORMAT(p.played_date, "%m/%d/%Y") `played_date`, p.played_time, p.length_in_secs, p.title, p.artist, RTRIM(p.label) `label`, RTRIM(p.album) `album`, IFNULL((SELECT SUM(l.listeners) FROM listeners `l` WHERE l.date_time >= p.start_date_time AND l.date_time <= p.end_date_time LIMIT 1), 0) `listeners` FROM playlists `p` WHERE p.title <> "" AND (p.played_date >= '2012-04-30' AND p.played_date <= '2012-05-30') HAVING listeners > 0 ORDER BY p.title ASC;
// formatted //
SELECT
DATE_FORMAT(p.played_date, "%m/%d/%Y") `played_date`,
p.played_time,
p.length_in_secs,
p.title,
p.artist,
RTRIM(p.label) `label`,
RTRIM(p.album) `album`,
IFNULL(
(SELECT
SUM(l.listeners)
FROM
listeners `l`
WHERE l.date_time >= p.start_date_time
AND l.date_time <= p.end_date_time
LIMIT 1),
0
) `listeners`
FROM
playlists `p`
WHERE p.title <> ""
AND (
p.played_date >= '2012-04-30'
AND p.played_date <= '2012-05-30'
)
HAVING listeners > 0
ORDER BY p.title ASC
输出:
played_date played_time length_in_secs title artist label album listeners
----------- ----------- -------------- --------------------- ------------------------ ------------------ ------------------ -----------
04/30/2012 08:06:26 228 Brighter Than The Sun Colbie Caillat (Cal-Lay) Universal Republic All of You 9
04/30/2012 08:44:16 248 Breakfast At Tiffanys Deep Blue Something 6
04/30/2012 18:06:40 253 Bizarre Love Triangle New Order 2
04/30/2012 17:05:21 183 Animal Neon Trees Mercury Habits 5
04/30/2012 08:58:05 253 Always Be My Baby Mariah Carey 2
04/30/2012 07:25:52 264 Already Gone Kelly Clarkson RCA All I Ever Wante 3
04/30/2012 16:21:33 236 All The Right Moves One Republic Interscope Waking Up 7
04/30/2012 11:58:26 199 All That She Wants Ace Of Base 12
04/30/2012 11:14:17 247 All I Wanna Do Sheryl Crow 2
04/30/2012 16:12:59 235 A Thousand Miles Vanessa Carlton 5
有没有办法优化此查询以更快地运行,或者编写一个更快的新查询?请建议/帮助我。谢谢!!
使用EXPLAIN
EXPLAIN playlists;
Field Type Null Key Default Extra
--------------- ---------------- ------ ------ ----------------- -----------------------------
playlist_id int(10) unsigned NO PRI (NULL) auto_increment
title varchar(255) YES (NULL)
artist varchar(255) YES (NULL)
label varchar(255) YES (NULL)
album varchar(255) YES (NULL)
length_in_secs int(11) NO (NULL)
played_date date NO (NULL)
played_time time NO (NULL)
start_date_time datetime NO (NULL)
end_date_time datetime NO (NULL)
added_date datetime NO (NULL)
modified_date timestamp NO CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
EXPLAIN listeners;
Field Type Null Key Default Extra
------------- ------------------- ------ ------ ----------------- -----------------------------
listener_id bigint(20) unsigned NO PRI (NULL) auto_increment
station_id int(10) unsigned NO (NULL)
client_ip varchar(50) NO (NULL)
time time NO (NULL)
date date NO (NULL)
date_time datetime YES (NULL)
timestamp bigint(20) unsigned NO (NULL)
listeners int(10) unsigned NO (NULL)
processes int(10) unsigned NO (NULL)
uid int(10) unsigned NO (NULL)
user_agent varchar(255) YES (NULL)
added_date datetime NO (NULL)
modified_date timestamp NO CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
答案 0 :(得分:4)
使用INNER JOIN
代替使用correlated subquery
:
SELECT DATE_FORMAT(p.played_date, "%m/%d/%Y") played_date,
p.played_time,
p.length_in_secs,
p.title,
p.artist,
RTRIM(p.label) label,
RTRIM(p.album) album,
SUM(l.listeners) listeners
FROM playlists p
INNER JOIN listeners l
ON l.date_time BETWEEN p.start_date_time AND p.end_date_time
WHERE p.title <> "" AND
p.played_date BETWEEN '2012-04-30' AND '2012-05-30'
ORDER BY p.title ASC;
考虑在表上添加以下索引可以帮助您提高查询的性能。使用EXPLAIN
检查以下索引:
playlists KEY (played_date, start_date_time, end_date_time, title);
listeners KEY (date_time, listeners);
答案 1 :(得分:1)
正如评论中所讨论的,您的查询实际上并没有按照您的意愿执行。根据你拥有的数据,我会亲自在SQL之外处理这个以创建一个表来存储每首歌的听众数量,然后你可以在SQL中查询以获取这些信息。如果你绝对想要一个SQL查询来做这件事,那么它将需要像这个怪物一样;
SELECT
DATE_FORMAT(p.played_date, "%m/%d/%Y") `played_date`,
p.played_time,
p.length_in_secs,
p.title,
p.artist,
RTRIM(p.label) `label`,
RTRIM(p.album) `album`,
SUM(SMALLEST(prev_listeners,next_listeners,dur_listeners) AS listeners
FROM (
SELECT
P.start_date_time,
SUBSTRING_INDEX(GROUP_CONCAT(l_before.listeners ORDER BY l_before.date_time DESC),',',1) AS prev_listeners,
SUBSTRING_INDEX(GROUP_CONCAT(l_after.listeners ORDER BY l_after.date_time ASC),',',1) AS next_listeners,
MIN(l_during) AS dur_listeners
FROM playlists p
JOIN listeners l_before ON l_before.date_time < p.start_date_time
LEFT JOIN listeners l_after ON l_before.client_ip = l_after.client_ip AND l_after.date_time > p.end_date_time
LEFT JOIN listeners l_during ON l.client_ip = l_during.client_ip AND l_during.date_time BETWEEN p.start_date_time AND p.end_date_time
WHERE p.title <> ""
AND p.played_date BETWEEN '2012-04-30' AND '2012-05-30'
GROUP BY p.start_date_time, l_before.client_ip
) l
JOIN playlists p USING (start_date_time)
GROUP BY p.start_date_time
ORDER BY p.start_date_time
其中SMALLEST是一个返回最小non_null参数的函数。
这将比你当前的查询花费更长的时间,但这是我能想到的最有效的方式来获得每首歌的实际听众数量。
哦,这是假设当来自ip地址的每个人都停止收听时,日志会记录一个零侦听器的行,否则实际上没有办法做到这一点。