我有一个超过10,000行的大表,在不久的将来会增长到1,000,000,我需要运行一个查询,为每个用户的每个关键字返回一个Time值。我现在有一个很慢,因为我使用左连接,它需要一个子查询/关键字:
SELECT rawdata.user, t1.Facebook_Time, t2.Outlook_Time, t3.Excel_time
FROM
rawdata left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Facebook_Time'
FROM rawdata
WHERE MainWindowTitle LIKE '%Facebook%'
GROUP by user)t1 on rawdata.user = t1.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Outlook_Time'
FROM rawdata
WHERE MainWindowTitle LIKE '%Outlook%'
GROUP by user)t2 on rawdata.user = t2.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Excel_Time'
FROM rawdata
WHERE MainWindowTitle LIKE '%Excel%'
GROUP by user)t3 on rawdata.user = t3.user
表格如下:
WindowTitle | StartTime | EndTime | User
------------|-----------|---------|---------
Form1 | DateTime | DateTime| user1
Form2 | DateTime | DateTime| user2
... | ... | ... | ...
Form_n | DateTime | DateTime| user_n
输出应如下所示:
User | Keyword | SUM(EndTime-StartTime)
-------|-----------|-----------------------
User1 | 'Facebook'| 00:34:12
User1 | 'Outlook' | 00:12:34
User1 | 'Excel' | 00:43:13
User2 | 'Facebook'| 00:34:12
User2 | 'Outlook' | 00:12:34
User2 | 'Excel' | 00:43:13
... | ... | ...
User_n | ... | ...
问题是,这是MySQL中最快的方法吗?
答案 0 :(得分:4)
我认为你的通配符搜索可能正在减慢它的速度,因为你无法在这些字段上真正使用索引。此外,如果您可以避免进行子查询并且只是进行直接连接,则可能有所帮助,但通配符搜索更糟糕。无论如何,您是否可以将表更改为具有索引且不需要通配符搜索的categoryName或categoryID?比如“where categoryName ='Outlook'”
要优化表格中的数据,请添加一个categoryID(理想情况下,这会引用一个单独的表格,但在本例中我们只使用任意数字):
alter table rawData add column categoryID int not null
alter table rawData add index (categoryID)
然后填充现有数据的categoryID字段:
update rawData set categoryID=1 where name like '%Outlook%'
update rawData set categoryID=2 where name like '%Facebook%'
-- etc...
然后更改您的插入内容以遵循相同的规则。
然后像这样进行SELECT查询(将通配符更改为categoryID):
SELECT rawdata.user, t1.Facebook_Time, t2.Outlook_Time, t3.Excel_time
FROM
rawdata left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Facebook_Time'
FROM rawdata
WHERE categoryID = 2
GROUP by user)t1 on rawdata.user = t1.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Outlook_Time'
FROM rawdata
WHERE categoryID = 1
GROUP by user)t2 on rawdata.user = t2.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Excel_Time'
FROM rawdata
WHERE categoryID = 3
GROUP by user)t3 on rawdata.user = t3.user